Re: Enhancements for HTML 2.1

Roy T. Fielding (fielding@avron.ICS.UCI.EDU)
Mon, 20 Mar 1995 09:15:32 -0800

>>I'm personally a Unicode advocate (our client already has Unicode
>>support designed and ready to implement as soon as I see a consensus
>>on how Unicode pages will be labelled and encoded).
> This is good to hear.
> I think the encoding and labelling has already reached consensus: we
> simply use the
> text/html; charset=xxxxx
> content type. For Unicode, xxxx could be replaced with:
> ISO-10646-UCS-4
> ISO-10646-UCS-2
> ISO-10646-UTF-1
> and UTF-7 is also defined in an RFC somewhere. I should note that the
> above come from the list for Internet documentation, not for the
> registered names (anyone care to list the registered names?).

The primary ones (not those above) are listed in the HTTP/1.0 draft:

Character sets are identified by case-insensitive tokens. The
complete set of allowed charset values are defined by the IANA
Character Set registry [17]. However, because that registry does
not define a single, consistent token for each character set, we
define here the preferred names for those character sets most
likely to be used with HTTP entities. This set of charset values
includes those registered by RFC 1521 [6] -- the US-ASCII [18] and
ISO8859 [19] character sets -- and other character set names
specifically recommended for use within MIME charset parameters.

charset = "US-ASCII"
| "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
| "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
| "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
| "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
| "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
| token

Although HTTP allows an arbitrary token to be used as a character
set value, any token that has a predefined value within the IANA
Character Set registry [17] must represent the character set
defined by that registry. Applications are encouraged, but not
required, to limit their use of character sets to those defined by
the IANA registry.

[17] J. Reynolds and J. Postel. "Assigned Numbers." STD 2, RFC 1700,
USC/ISI, October 1994.

....Roy T. Fielding Department of ICS, University of California, Irvine USA