Re: Revised language on: ISO/IEC 10646 as Document Character Set

Erik van der Poel (erik@netscape.com)
Mon, 8 May 95 17:26:49 EDT

>For example, the
>ISO-2022-JP character encoding scheme can be used for HTML documents,
>since its repertoire is a subset of the ISO10646 repertoire.

Does the HTTP charset have to be a subset of 10646? How do you define
"subset"?

If you convert iso-2022-jp to 10646 and then back again to iso-2022-jp,
you could end up with a file that is different from the original
iso-2022-jp document. For example, some of the "double-width" Roman
characters in the JIS X 0208 portion of iso-2022-jp are not in 10646.
Also, you could lose info encoded in the escape sequences themselves:
ESC ( B, ESC ( J, ESC $ @ and ESC $ B. So iso-2022-jp is not a subset
of 10646, if you look at it this way.

Why not just remove the restriction that the HTTP charset has to be a
subset of 10646? I.e. remove the word "subset" somehow.

Erik