Re: ISO/IEC 10646 as Document Character Set

Glenn Adams (glenn@stonehand.com)
Thu, 4 May 95 17:21:32 EDT

Date: Fri, 05 May 1995 15:47:43 -0500
From: eric@spyglass.com (Eric W. Sink)

Here's my attempt:

"The specified document character set for HTML 2.0 documents is ISO 10646.
However, conforming applications may support only ISO 8859-1, a subset of
ISO 10646. In fact, at the time of this writing, virtually all known
implementations do indeed only support the ISO 8859-1 subset."

I can accept your suggestion. To reflect this into the RFC, the following
changes are needed:

> An HTML user agent should use the SGML declaration is given in
> `SGML Declaration for HTML'. It specifies ISO-8859-1 as the document
> character set, so that the markup `*' represents an asterisk
> character.

Change ISO-8859-1 to ISO/IEC 10646-1:1993.

> 2.2. HTML Lexical Syntax The syntax character set for all HTML docu-
> ments is ISO-646-IRV. A minimally conforming HTML user agent must sup-
> port the SGML declaration in `SGML Declaration for HTML', which speci-
> fies ISO Latin 1 (@@full name) as the document character set; it may
> support other SGML declarations, in particular, SGML declarations with
> other document character sets.

Change to "which specifies ISO/IEC 10646-1:1993 Information technology -
Universal Multiple-Octet Coded Character Set (UCS) as the document character
set".

Change the last clause "; it may support other SGML declarations..." to
a note which reads: "An implemenation which supports only the ASCII or
ISO 8859-1 subset of ISO/IEC 10646 may make use of an SGML declaration
as part of its implementation which specifies one of these subsets of
ISO/IEC 10646 as the document character set. In this case, numeric
character references containing character numbers outside of the range
of describe characters may be treated in an implementation dependent
manner."

Add the following note at an appropriate place (perhaps in section 3.2
after the paragraph "HTML user agents must support the ISO-8859-1
character encoding scheme..."):

"Conforming applications may support only ISO 8859-1, a subset of
ISO/IEC 10646. At the time of this writing, virtually all known
implementations do indeed only support the ISO 8859-1 subset."

Regards,
Glenn