Proposal: Specify character sets by ISO number

Murray Maloney <murray@oclc.org>
Date: Thu, 16 Jun 94 10:51:44 EDT
Message-id: <9406161045.aa11227@dali.scocan.sco.COM>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: Murray Maloney <murray@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: Proposal: Specify character sets by ISO number
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group

I propose that the wording of the section (2.3) on character sets
be modified to reflect the ISO standards associated with Latin-1
and with the 7-bit subset (commonly known as ASCII) so that there
can be no ambiguity as to the the either the character set or encodings
being discussed.  

I have modified the wording of the existing section slightly below:


<HTML>
<H2>Character sets</H2>

<P>The base character set (the SGML BASESET) for HTML is ISO 8859/1,
also known as <EM>Latin alphabet No. 1</EM> or simply <EM>Latin-1</EM> 
This is the set referred to by any numeric 
<A NAME="z3" HREF="Text.html#z4">character references</A>.
The actual character set used in the representation of an HTML
document may be ISO 8859/1, or its 7-bit subset which is ISO 646 (ASCII).
There is no obligation for an HTML document to contain any characters
above decimal 127. It is possible that a transport medium such
as electronic mail imposes constraints on the number of bits
in a representation of a document, though the HTTP access protocol
used by W3 always allows 8 bit transfer.

<P>When an HTML document is encoded using 7-bit characters,
then the mechanisms of <A NAME="z1"
HREF="Text.html#z4">character references</A> and <A
NAME="z2" HREF="Text.html#z5">entity references</A> may be used
to encode characters in the upper half of the ISO 8859/1 Latin-1 set.
In this way, documents may be prepared which are suitable for
mailing through 7-bit limited systems.

<H3>Character set option (proposed)</H3>

<P>The SGML declaration specifies ISO 8859/1 Latin alphabet No. 1
as the base character set.
The charset parameter is reserved for future use.
Its intended significance is to override the base character set
of the SGML declaration. Support of character sets other than
ISO 8859/1 Latin alphabet No. 1 is not a requirement for conformance
with this specification.
</HTML>