Re: partial draft: "Character Set" Considered Harmful

Glenn Adams (
Mon, 10 Apr 95 14:21:57 EDT

As I pointed out in an earlier message, the SGML standard specifies that
numeric character references are to be resolved in the context of the
coded character set used to represent a given entity. If that coded
character set differs from the document (coded) character set as specified
by the SGML declaration, then a numeric character reference in the entity
should first be interpreted as denoting a character in the entity,
and then that character should be translated to the corresponding character
in the document character set.

To quote from ISO 8879:1986 Clause 9.5, lines 8-13, 14, 19-22:

A replacement character is considered to be in the same entity as
its reference.

A replacement character is treated as though it were entered
directly except that the replacement for a numeric character
reference is always treated as data in the context in which
the replacement occurs.


2 When a document is translated to a different document character
set, the character number of each numeric character reference must
be changed to the corresponding character number of the new set.