Re: ISO/IEC 10646 as Document Character Set

James Clark (jjc@jclark.com)
Sat, 6 May 95 14:15:31 EDT

> From: Glenn Adams <glenn@stonehand.com>
> Date: Sat, 6 May 95 13:22:57 -0400
>
> Date: Sat, 6 May 95 11:31:27 EDT
> Reply-To: jjc@jclark.com
>
> It is an interesting question what restrictions there are on the
> character number [in a numeric character reference]. 13.1.1 says:
>
> The described character set portions must collectively describe
> each character number in the described character set once
> and only once.
>
> Given this and given that the number is a "character number", I think
> one could argue that the number in the character reference must be one
> that was described (even if only as UNUSED) in the document character
> set section of the SGML declaration.
>
> I'm not convinced. The 13.1.1 text seems to be oriented towards preventing a
> single character number from being described more than once than it is
> oriented towards requiring every character number to be described at least
> once.

The fact that character numbers can be declared as UNUSED suggests to
me that the intention is that all character numbers be described.

4.32 also points in that direction:

NOTE - Specific characters are assigned to character classes in four
different ways:
...
d) explicitly, by the document character set (NONSGML).

> The text seems quite vague on this point.

I wouldn't claim my interpretation is the only one possible. (In fact
it is not what the currently released version of SP implements.)

However, I think it is a useful one because it allows users to control
what character numbers are allowed in numeric character references by
choosing what characters they declare as UNUSED in the SGML
declaration. If you are working with a parser that can handle large
character numbers in numeric character references, it is useful to be
able to get the parser to reject numeric character references with
character numbers > 255.

> Has this question been put to Charles or WG8?

Not as far as I know. I'm Cc'ing the comp-std-sgml list to get some
more expert SGML opinion.

James