Re: ISO/IEC 10646 as Document Character Set

Glenn Adams (glenn@stonehand.com)
Sat, 6 May 95 13:17:26 EDT

Date: Sat, 6 May 1995 16:04:10 +0000
From: jjc@jclark.com (James Clark)

What the standard means by "occurrence of a non-SGML character" is the
occurrence of a non-SGML character in the sequence of characters
comprising an entity that is parsed.

Thanks for the clarification. What, then, if anything, does the standard
say about the occurrence of a numeric character reference whose character
number is construed as a non-SGML character due to it not being in the
described characters of the SGML declaration?

In this case, since no non-SGML character actually occurs in the characters
which comprise the entity, then would it *not* be construed as "an occurrence"?

The interpretation of the note after production [49] seems to indicate that
such a character number would be construed as a data character:

"NOTE - A non-SGML character can be entered as a data character within
an SGML entity by using a character reference."

Since the class of dedicated data characters (DATACHAR) is nowhere enumerated
and is not required to be exhaustively enumerated by the SGML declaration,
then should the occurrence of a numeric character reference whose character
number is not described by the document character set constitute an error?
If so, then how can this be reconciled with code extension which allows the
presence of a data character which is not in the document character set?

Regards,
Glenn