Re: partial draft: "Character Set" Considered Harmful

Larry Masinter (
Mon, 10 Apr 95 15:41:52 EDT

>To quote from ISO 8879:1986 Clause 9.5, lines 8-13, 14, 19-22:
> A replacement character is considered to be in the same entity as
> its reference.
> A replacement character is treated as though it were entered
> directly except that the replacement for a numeric character
> reference is always treated as data in the context in which
> the replacement occurs.
> 2 When a document is translated to a different document character
> set, the character number of each numeric character reference must
> be changed to the corresponding character number of the new set.

Please note that it says "translated to a different document character
set", and not "translated to use a different character encoding".

The 'document character set' is a technical term in SGML. What's been
proposed is that the 'document character set' for HTML be standardized
as ISO 10646. Independently of the encoding. We won't translate the
'document character set' at all.