Re: ISO/IEC 10646 as Document Character Set

Albert Lunde (
Fri, 5 May 95 12:26:22 EDT

>5. Albert Lunde is against it.

Actually I'm sort of on the fence... I can live with it either way; I just
want to make sure we signal the direction we are going and warn developers
against using other document character sets and make clear the numeric
references issue for non-SGML wizards.

I like either:

(1) Put in an appendex that says we use Latin-1 now but we are going to use
10646 for internationalization and developers should use it for other MIME
charsets would work.

(2) Say that the document character set is 10646, but that conforming
applications are required only to support the subset that is Latin-1. Note
that previous specifications of HTML have been based on on Latin-1, but
this change was made to support future internationalization.

In either case we should warn developers about the numeric references
issue; if it confused us, it will confuse a lot of other people.

Actually, I'm leaning towards (2) as a little conceptually nicer as SGML;
the argument for (1) that it's a way to get this out the door.

Let me suggest some language for warnings:

This should either go in the section where examples of number references
are given or in a appendix on future internationalization.
= =
(Either way)

"The SGML standard specifies that numeric character references are
interpreted according to the SGML document character set, not the character
encoding currently used to represent the document. "

(Option (1):)

"While this specification is written using a a document character set of
ISO-8859-1, future specifications of HTML will be written using a document
character set of ISO/IEC 10646 , as a basis for internationalization.
For upward compatibility, developers of software supporting other MIME
charsets than ISO-8859-1 should intepret numeric references as if ISO/IEC
10646 were the document character set, and should not infer an SGML
declaraction that just maps the MIME charset into an SGML document
character set."


"The document character set has been specified as ISO/IEC 10646 to make the
interpretation of numeric references clear in other character encodings and
as a basis for further internationalization, though previous specifications
of HTML were based on ISO-8859-1. Developers of software supporting other
MIME charsets than ISO-8859-1 must intepret numeric references based on
ISO/IEC 10646 as the document character set.

= =

    Albert Lunde