Re: ISO/IEC 10646 as Document Character Set

Gavin Nicol (gtn@ebt.com)
Mon, 8 May 95 02:39:53 EDT

>2. What kind of error should be reported upon an occurrence of a numeric
>character reference which contains a character number which *is*
>described by the document character set (by reference to a base set
>character number) but which *is not* described by the system character
>set? Or which is decribed by the (formal specified) system character
>set but which has no bit combination in the (actually implemented)
>system character set?

This is a very important question. On this hinges the viability of
using ISO 10646.

Glenns' quote:
A system declaration must meet the same syntax requirements
as an SGML declaration with respect to the concrete syntax used,
data characters allowed, etc.

Also, in Goldfarb p452:

NOTE--It is recognized that the recipient of a document must be able
to translate it to his system character set before the document can
be processed by machine. There are two basic approaches to
communicating this information.
. . .
As the last not implies, the document character set parameter is
ignored by the SGML parser because the document is already in the
document character set. The parameter is intended for a human to
read in printed form, in order to determine how to translate an
incoming document to the local system character set.

The actual translation process from a document character set to the
system character set is not defined, so we have 2 ways to interpret
these notes:

1) That all characters in the document must also be available to the
system, and a simple one-to-one translation is performed.
2) That the translation process can perform arbitrary translations.

Also, the system representation is undefined, providing another grey
area.