Re: HTML 2.0 LAST CALL: Numeric character refs

Terry Allen (
Fri, 2 Jun 95 11:18:15 EDT

| In message <>, "Terry Allen" writes:
| >Dave Morris writes:
| >| On the other hand, references to undeclared entities
| >| + and numeric character references which cannot be resolved
| >| + (e.g., are out of range)
| >| should be treated as data characters.
| >
| >And are not what we want to say here.
| Why not?

Because this section is about entities, not numeric charrefs, which
are dealt with elsewhere (grep for 10646).

| > The language about
| >numeric charrefs has been carefully crafted. It will be
| >revised in the next version of HTML that appears after an
| >internationalization proposal is agreed upon (Gavin, time to
| >get a move on).
| Agreed, but...

So you're just going to throw out several weeks of discussion
(I'm sure you can find it without citations) because one person
made a suggestion opposed by one other person who participated
in that discussion? I'd say you have no basis for making the

| > At that point we can discuss what "out of
| >range" might mean.
| "out of range" means >255, or whatever the SGML declaration in
| effect says is out of range. It seems perfectly well-defined to
| me.

Not without some reference to the SGML decl it isn't.

| > I strongly urge we stay with the
| >present language here, much as I feel your pain.
| Personally, I don't give a flying flip one way or the other. I'm
| pretty tired of specifying what HTML user agents should do when the
| modem introduces line noise into the document, your baby brother pukes
| on it, and the stars align to signal the end of the world. An error
| is an error. Deal with it.

As you may recall, we are talking about SGML conformance here.

| But one more "should" in there in the interest of consistent error
| handling at this point won't hurt anything.

If you parse this document

<!doctype html system "html.dtd">
<p>charref: &#62123;

with sgmls and the HTML sdecl you get in the error stream:

sgmls: SGML error at teal.html, line 3 at ";":
Numeric character reference exceeds 255; reference ignored

and in the output:


Notice that the NCR is not in the output. There is thus no way
to convert it to a text string. That will have to wait until
we agree upon 10646 as the doc charset.

I repeat my opposition to Dave's proposed language. We spent too
much time on this matter to regress in this fashion, and if we
specify HTML so that it is not conformant to 8879, we will
deserve what we get if people ignore our spec.


Terry Allen  (   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
occasional column at:

A Davenport Group sponsor. For information on the Davenport Group see or