Re: ISO/IEC 10646 as Document Character Set

James Clark (jjc@jclark.com)
Sat, 6 May 95 11:31:23 EDT

> Date: Fri, 5 May 95 16:29:38 EDT
> From: Glenn Adams <glenn@stonehand.com>
>
> What are you talking about? I can put &#2789 in a document today
> without violating SGML conformance. The appearance of non-SGML
> characters in a document is not a reportable markup error (according
> to ISO 8879 4.267), and, therfore, does not produce non-conformance.

This is a misinterpretation of 4.267.

A validating SGML parer only has to report an error (that is a failure
of the document to conform to the requirements of the standard), if
that error is a reportable markup error.

The cases identified in 4.267 are still errors, and a validating
parser may report them. A parser that has "NONSGML YES" in its system
declaration is required to report an error if a NONSGML character
occurs.

What the standard means by "occurrence of a non-SGML character" is the
occurrence of a non-SGML character in the sequence of characters
comprising an entity that is parsed. This is an error because the
syntax productions of the standard only allow characters that are data
characters [48]. I would guess that the reason the standard does not
require this error to be reported was that it was thought that it
would be difficult to implement on some systems which don't handle the
occurrence of certain bytes in text files (eg control-Z in MS-DOS).

James