Character Set

Murray Maloney (murray@sco.COM)
Thu, 22 Sep 94 10:50:47 EDT

This is number one of four files being submitted
as parts of the HTML 2.0 specification.

Save this file as "Charsets.html"

==================== CUT HERE ====================
<H2>Character set</H2>
<P>
The base character set (the SGML BASESET) for HTML is ISO 8859/1,
also known as <EM>Latin alphabet No. 1</EM> or simply <EM>Latin-1</EM>.
This is the set referred to by any numeric
<A NAME="z3" HREF="Text.html#numcharrefs">character references</A>.
<P>
The actual character set used in the representation of an HTML
document may be ISO 8859/1,
or its 7-bit subset which is <A HREF="#iso646"> ISO 646. </A>
There is no obligation for an HTML document to contain any characters
above decimal 127.
It is possible that a transport medium such
as electronic mail imposes constraints on the number of bits
in a representation of a document, though the HTTP access protocol
used by W3 always allows 8 bit transfer.

<P>When an HTML document is encoded using 7-bit characters,
then the mechanisms of
<A NAME="z1" HREF="Text.html#numcharrefs">numeric character references</A>
and <A NAME="z2" HREF="Text.html#charents">character entity references</A>
may be used to encode characters
in the upper half of the ISO 8859/1 Latin-1 set.
In this way, documents may be prepared which are suitable for
mailing through 7-bit limited systems.

<A NAME="iso646">
<H3> NOTE: ISO 646 and ASCII</H3>
ISO 646 is, for all intents and purposes, equivalent to the
ANSI standard for ASCII (American Standard Code for Information Interchange).
The only notable differences between the two standards are the names
that have been assigned to the control characters which occupy
positions 00 through 31 and position 127 (decimal) in that encoding.
For the purposes of encoding HTML documents, only
<A HREF="Text.html#ctlchars"> three control characters </A>
in ISO 646 or ASCII are relevant.
These are Carriage Return (CR) at position 13,
Line Feed (LF) at position 10, and Horizontal Tab (HT) at position (11).