Re: partial draft: "Character Set" Considered Harmful

Gavin Nicol (gtn@ebt.com)
Mon, 10 Apr 95 01:39:33 EDT

Amanda

>Speaking as a commercial vendor of WWW software, I'd like to see
>numeric character references dropped completely, or, if they must be
>left in for "backward compatibility," be restricted to interpretation
>as code points in ISO 8859-1.

Sadly, we cannot really do either.

>The only reason they exist in the first place is that named character
>entities could not be relied on in all browsers, and it was a "quick
>fix."

Incorrect. They are in SGML so that typists could enter characters
not directly supported by their keyboards. HTML inherits them from
SGML.

>I strongly feel that while we may be constrained to supporting
>existing "quick fixes," extending them into a multilingual environment
>is a dire mistake.

We have no choice. They are part of HTML, like it or not.

>I am also a strong supporter of the use of IS 10646 for similar
>reasons.

Ahh. My proposal is that the document character set for HTML be ISO
10646, and hence <EM>all numeric character references would be evaluated
in terms of ISO 10646</EM>. I will state again that this does *not*
mean that all browsers must support UCS-2 or anything like that
(though that would be even better!), but that they must simply
recognise characters, and classify them according to the roles we
assign them (informally ASCII or ISO 8859-1 for markup, everything
else data).

>I don't know about anyone else, but I for one find the status quo
>quite unacceptable.

Your are certainly not alone!

Have you read my proposal? For browsers like yours, it is ideal.