A few comments on this latest draft:
>3.2 Character Set Issues
>
>...
>
>When an HTML document is encoded using US-ASCII, the mechanisms of
>character entity references (Section 6.3) may be used to encode
>additional characters from ISO-8859-1.
I don't think the use of entities should be restricted to
ASCII-encoded documents. They are always legal, as long as one has
ASCII to mark them up (see section 6.3.1).
>...
>Therefore, user agents may use the charset parameter to select a
>different declaration, even though the mechanism...
[nit] I would add:
The intent, however, is that such a declaration be as identical as
possible to that of section 12.3, the only differences being those
required to support the announced charset.
>6.3.2 Character octet reference
It doesn't make much sense to say "#233" to mean "e-acute" in a
document if codepoint 233 in that document's encoding means something
else than "e-acute". I would either restrict the use of those
entities to documents encoded in Latin-1, or specify that they mean
"the character whose codepoint is given by the number, in the encoding
specified by the charset parameter (ISO-8859-1 by default)".
BTW, the table in section 13.3 has errors: grave-accented letters
always come before acute-accented ones in Latin-1, contrary to what
the table says. I haven't had time to check the rest, sorry.
-- François Yergeau <yergeau@alis.ca> Alis Technologies Inc., Montréal Tél: +1 (514) 738-9171 Fax: +1 (514) 342-0318