Re: partial draft: "Character Set" Considered Harmful

Albert Lunde (Albert-Lunde@nwu.edu)
Mon, 10 Apr 95 15:26:13 EDT

At 2:21 PM 4/10/95, Glenn Adams wrote:
>As I pointed out in an earlier message, the SGML standard specifies that
>numeric character references are to be resolved in the context of the
>coded character set used to represent a given entity. If that coded
>character set differs from the document (coded) character set as specified
>by the SGML declaration, then a numeric character reference in the entity
>should first be interpreted as denoting a character in the entity,
>and then that character should be translated to the corresponding character
>in the document character set.
>
>To quote from ISO 8879:1986 Clause 9.5, lines 8-13, 14, 19-22:
>
> A replacement character is considered to be in the same entity as
> its reference.
>
> A replacement character is treated as though it were entered
> directly except that the replacement for a numeric character
> reference is always treated as data in the context in which
> the replacement occurs.
>
> NOTES
>
> 2 When a document is translated to a different document character
> set, the character number of each numeric character reference must
> be changed to the corresponding character number of the new set.

I'm not sure I understand what is being said here.

Does this mean that, if I have a document coded (in a simple-minded sense)
in US-ASCII containing, say, SGML stuff refering to ISO Latin-1 characters
(not in US-ASCII) via numeric character references, and I translate it to
EBCDIC, I have to do something to convert the numeric character references,
to stay consistent with SGML?

If, so what?

This is the kind of issue that HTML's treatment of the interactions between
(pardon me) "character sets" and SGML needs to address.

The idea of having all numeric character references refer to Unicode (which
seems to be a feature of Gavin's proposal), treasts this case nicely. (One
just converts the text in a simple-mined way, and the references mean the
same thing.)

I'm not sure if you are saying that this (Gavin's proposal) would not be
legal SGML, suggesting a different scheme for interpreting numeric
references, or what.

Can someone clarify this?

---
    Albert Lunde                      Albert-Lunde@nwu.edu