Re: progress on HTML 2.0 reconstruction

Francois Yergeau (yergeau@alis.ca)
Tue, 28 Mar 95 18:42:11 EST

Roy T. Fielding <fielding@avron.ICS.UCI.EDU> writes:
>A second draft is now ready for review.

A few comments on this latest draft:

>3.2 Character Set Issues
>
>...
>
>When an HTML document is encoded using US-ASCII, the mechanisms of
>character entity references (Section 6.3) may be used to encode
>additional characters from ISO-8859-1.

I don't think the use of entities should be restricted to
ASCII-encoded documents. They are always legal, as long as one has
ASCII to mark them up (see section 6.3.1).

>...
>Therefore, user agents may use the charset parameter to select a
>different declaration, even though the mechanism...

[nit] I would add:

The intent, however, is that such a declaration be as identical as
possible to that of section 12.3, the only differences being those
required to support the announced charset.

>6.3.2 Character octet reference

It doesn't make much sense to say "#233" to mean "e-acute" in a
document if codepoint 233 in that document's encoding means something
else than "e-acute". I would either restrict the use of those
entities to documents encoded in Latin-1, or specify that they mean
"the character whose codepoint is given by the number, in the encoding
specified by the charset parameter (ISO-8859-1 by default)".

BTW, the table in section 13.3 has errors: grave-accented letters
always come before acute-accented ones in Latin-1, contrary to what
the table says. I haven't had time to check the rest, sorry.

-- 
François Yergeau <yergeau@alis.ca>
Alis Technologies Inc., Montréal
Tél: +1 (514) 738-9171
Fax: +1 (514) 342-0318