Character set designations [Was: HTML 2.0 comments (First of two) ]

Daniel W. Connolly (connolly@hal.com)
Wed, 23 Nov 94 16:41:51 EST

In message <199411232100.NAA09939@rock>, Terry Allen writes:
>
>> Because there
>is no agreement on the string names for code sets (ISO 8859-1
>may be called any of
>
> ISO8859-1
> iso88591
> Latin-1
> 8859-1
> ISO-8859-1
>
>or something else on individual systems), OSF created a registry
>
>
>This was unnecessary; ISO Latin 1 has a Formal Public Identifier:
> ISO 8879:1986//ENTITIES Added Latin 1//EN

That public identifier references as set of entity declarations for
Added Latin 1 characters. It specifies neither a MIME character set
nor an SGML character set:

The correct MIME character set designation is ISO-8859-1.

>From the MIME RFC, RFC1521, section 7.1.1:

This RFC specifies the definition of the charset parameter for the
purposes of MIME to be a unique mapping of a byte stream to glyphs, a
mapping which does not require external profiling information.

An initial list of predefined character set names can be found at the
end of this section. Additional character sets may be registered
with IANA, although the standardization of their use requires the
usual IESG [RFC-1340] review and approval. ...

The defined charset values are:

US-ASCII -- as defined in [US-ASCII].

ISO-8859-X -- where "X" is to be replaced, as necessary, for the
parts of ISO-8859 [ISO-8859]. Note that the ISO 646
character sets have deliberately been omitted in favor of
their 8859 replacements, which are the designated character
sets for Internet mail. As of the publication of this
document, the legitimate values for "X" are the digits 1
through 9.

The SGML character set is given in the SGML declaration for HTML:

CHARSET
BASESET "ISO 646:1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
BASESET "ISO Registration Number 100//CHARSET
ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET 128 32 UNUSED
160 96 32

Actually, sgmls-1.1.91 on my linux box at home reported some errors
with that character set designation. I seem to recall some mail from
somebody that addressed this issue, but I have looked for it recently,
and I can't find it. If an SMGL priest would kindly give me the
correct spelling, I'd be most greatful.

Dan