Re: HTML 2.0 LAST CALL: Numeric character refs

Daniel W. Connolly (connolly@beach.w3.org)
Fri, 2 Jun 95 17:16:22 EDT

In message <9506021348.ZM29226@dmg.west.ora.com>, "Terry Allen" writes:
>| >Notice that the NCR is not in the output.
>|
>| Notice also that there is no "C" at the end of the output; i.e. the
>| document is not conforming. Ignoring &#62123; altogether is one way to
>| handle the error. The HTML 2.0 specification suggests another.
>
>which is to ignore ISO 8879 and roll our own. No thanks.

ISO 8879 does _not_ specify what to do with non-conforming documents.
This is _not_ ignoring ISO 8879.

>| > There is thus no way
>| >to convert it to a text string.
>|
>| Sure there is: pretend you never recognized characters as markup,
>| and just treat them as data characters.
>
>Then write me a free conforming SGML parser that does that!

Ah! Now perhaps I see the problem: you want to use sgmls in an HTML
user agent. You'll have to patch it to abide by this convention. I
doubt it's more than a few lines of code -- just find the error
handling code in sgmls and hack a little bit. Note that it's only a
convention -- a "should" not a "must."

And after all, there's always the HTML parser in libwww.

>| > That will have to wait until
>| >we agree upon 10646 as the doc charset.
>| Huh? what does 10646 have to do with the price of tea in china?
>
>you're tying these undefined NCRs to 10646, right?

No. What made you think that? Undefined means just that: undefined.
Not defined by ISO10646, ISO8859-1, or anything else.

>| It's not that big a deal: we're not going against 8879; it's just
>| one more "should" in the interest of consistent error handling,
>| which I sensed from the working group is a good thing.
>
>What's consistent about saying we're defining a conforming application
>and then say something else. After all,
>
>1.2.1. Documents
>
> A document is a conforming HTML document only if:
>
> * It is a conforming SGML document, and it conforms to
> the HTML DTD (see 8.1, "HTML DTD").

Right. But in this whole section on undeclared markup, we're talking
about things that are NOT CONFORMING HTML DOCUMENTS! These are ERRORS!

Dan