REVISION:

Murray Maloney <murray@oclc.org>
Date: Thu, 23 Jun 94 08:01:39 EDT
Message-id: <9406230755.aa24716@dali.scocan.sco.COM>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: Murray Maloney <murray@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: REVISION:
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group

This is a revised version of the discussion of character sets

	2.3 Character Sets

I have:

	- added the ISO standard number for the character sets.
	- added a note on the relationship between ISO 646 and ASCII

I believe that this revised version more accurately reflects HTML 2.0.

Three other revised and related files follow.

Murray

==================== CUT HERE ========================================
<H2>Character sets</H2>

<P>The base character set (the SGML BASESET) for HTML is ISO 8859/1,
also known as <EM>Latin alphabet No. 1</EM> or simply <EM>Latin-1</EM>.
This is the set referred to by any numeric 
<A NAME="z3" HREF="Text.html#numcharrefs">character references</A>.
<P>
The actual character set used in the representation of an HTML
document may be ISO 8859/1, 
or its 7-bit subset which is <A HREF="#iso646"> ISO 646. </A>
There is no obligation for an HTML document to contain any characters
above decimal 127.
It is possible that a transport medium such
as electronic mail imposes constraints on the number of bits
in a representation of a document, though the HTTP access protocol
used by W3 always allows 8 bit transfer.

<P>When an HTML document is encoded using 7-bit characters,
then the mechanisms of 
<A NAME="z1" HREF="Text.html#numcharrefs">numeric character references</A> 
and <A NAME="z2" HREF="Text.html#charents">character entity references</A>
may be used to encode characters 
in the upper half of the ISO 8859/1 Latin-1 set.
In this way, documents may be prepared which are suitable for
mailing through 7-bit limited systems.

<A NAME="iso646">
<H3> NOTE: ISO 646 and ASCII</H3>
ISO 646 is, for all intents and purposes, equivalent to the 
ANSI standard for ASCII (American Standard Code for Information Interchange).
The only notable differences between the two standards are the names
that have been assigned to the control characters which occupy
positions 00 through 31 and position 127 (decimal) in that encoding.
For the purposes of encoding HTML documents, only 
<A HREF="Text.html#ctlchars"> three control characters </A>
in ISO 646 or ASCII are relevant.
These are Carriage Return (CR) at position 13,
Line Feed (LF) at position 10, and Horizontal Tab (HT) at position (11).

<H3>Character set option (proposed)</H3>

<P>The SGML declaration specifies ISO Latin 1 as the base character set.
The charset parameter is reserved for future use.
Its intended significance is to override the base character set
of the SGML declaration.
Support of character sets other than ISO 8859/1 Latin-1
is not a requirement for conformance with this specification.