Re: Charsets in .01 spec

Gavin Nicol (gtn@ebt.com)
Fri, 10 Feb 95 12:20:43 EST

>Looking at the just-announced .01 version of the spec, which I too
>would like to kick out the door, I find some problems with the
>section on charsets. Dan, you were too optimistic about this
>language. Considering that 2.0 is emphatically Latin 1, even
>to the point that no other charset is approbated, let's just
>say that 2.0 does Latin 1 and we defer expanding the realm
>of allowable charsets until 2.1.

With regret, I can but agree...

>| When an HTML document is encoded using US-ASCII,
>| the mechanisms of numeric character references (see
>| Section 2.16.2) and character entity references (see
>| Section 2.16.3) may be used to encode additional characters
>| from ISO-8859-1.
>
>This also works in any ISO-8859-n charset, and others. As the
>SGML decl is fixed (in 5.1), I see no value in the preceding para.

By saying "is encoded using US-ASCII", what is meant? That seems
somewhat contrary to sentence following.

In addition, numeric and named character references work fine so long
as the data they refer to matches the document character set. The
restriction above is silly (what about the named charater
referenced in the back of Goldfarb that map to "[ecirc ]"?).

Thank you, Terry.