Re: Revised language on: ISO/IEC 10646 as Document Character Set

Glenn Adams (glenn@stonehand.com)
Tue, 9 May 95 09:25:57 EDT

Date: Mon, 8 May 95 17:26:29 EDT
From: erik@netscape.com (Erik van der Poel)

Does the HTTP charset have to be a subset of 10646?

No. It can be anything. Making 10646 the doc charset doesn't place
any requirements on the HTTP charset.

If you convert iso-2022-jp to 10646 and then back again to iso-2022-jp,
you could end up with a file that is different from the original
iso-2022-jp document. For example, some of the "double-width" Roman
characters in the JIS X 0208 portion of iso-2022-jp are not in 10646.
Also, you could lose info encoded in the escape sequences themselves:
ESC ( B, ESC ( J, ESC $ @ and ESC $ B. So iso-2022-jp is not a subset
of 10646, if you look at it this way.

No, you are incorrect on both accounts. Every character in JIS X 0208,
including all zenkaku Roman characters, is separately encoded in 10646.
[Apparently you haven't had the opportunity to look at 10646/Unicode?]
In addition, escape sequences and control functions in general have a
representation in 10646/Unicode; namely, the 16- or 32-bit extended
forms of those sequences. These sequences simply would not be interpreted
according to their normal ISO 2022 code extension semantics (though,
interestingly, one could interpret them as a poor-man's form of language
bindings).

Glenn