Re: ISO/IEC 10646 as Document Character Set

Gavin Nicol (
Thu, 4 May 95 21:30:33 EDT

>[By the way: what happens if such a reference in unknown (e.g.
>a reference to something beyond Latin-1 if the document character
>set is only Latin-1? Ideally, from an Unicode point of view, it would
>be at least ignored for display (but not eliminated when forwarding
>the docment again), without creating errors, but probably SGML
>has other ideas for this case.]

This should be an error, but SGML does not require that it be
signalled as such. As Glenn and I discussed recently, even if the
document character set is ISO 10646, there is application specific
behaviour, because the representation of ISO 10646 characters in the
system character set is undefined in the SGML standard. Thus if the
system character set is ISO-8859-1 and the numeric character reference
is out of the range of ISO 10646 characters the system has a 1 to 1
representational mapping for, the system may represent the character
in any way it chooses.

Bottom line is: it's application specific.