Re: Numeric Char Ents in 2.0 draft

Francois Yergeau (yergeau@alis.ca)
Fri, 31 Mar 95 11:49:30 EST

>Date: Fri, 31 Mar 95 06:54:35 EST
>From: Gavin Nicol <gtn@ebt.com>
>
>Can we please leave this discussion until I get my next paper out. I
>discuss all these issues, and the above forms a central part of the
>discussion.
>
>I know time is of the essence, but let's get 2.0 out of the way, and
>fix this in 2.1.

Please don't, precisely because time is of the essence. 2.1 will be
too late, and it is better to offer guidance, even if somewhat vague
and incomplete, than to simply ignore the problem in 2.0.

Re numerical character entities (NCEs), said guidance could be:

To those who produce HTML:

1) Don't use NCEs; it's a crab's nest. Use the existing
named entities for Latin-1 characters.
2) If you must, consider that HTML strives to be an SGML
application, and that the SGML conformant way is to use
NCEs to represent characters in the document character
set, which should be signaled by the charset parameter.

To those who interpret HTML:

1) NCEs are supposed to represent characters in the document
character set, as indicated by the charset parameter.
2) If you translate the encoding of an HTML document before
processing it, don't forget to also translate the NCEs within
(so that e.g. what used to refer to a copyright symbol still
does after translation).

That leaves 2.1 to trash out the important details, like how exactly
to produce an HTML DTD for charsets other than Latin-1. Gavin's
long-awaited document will be welcome for that.

-- 
François Yergeau <yergeau@alis.ca>
Alis Technologies Inc., Montréal
Tél: +1 (514) 738-9171
Fax: +1 (514) 342-0318