Re: Revised language on: ISO/IEC 10646 as Document Character Set

Martin J Duerst (
Thu, 11 May 95 14:52:49 EDT

Ned Freed wrote:

>The problem is that some people don't agree that its the same character. They
>believe that the language the character is associated with is part of the
>character that has to be preserved. According this logic you can talk about
>mapping from ISO-2022-JP to something you might call ISO-10646-JP, but that you
>cannot map to generic ISO-10646, and that therefore ISO-2022-JP is NOT a subset
>of ISO-10646. (Some have even go so far as to assert that ISO-10646 does not
>meet the requirements of being a character set.)

>This argument was raised, quite forcefully, during the MIME work. Speaking as
>one of the coauthors of MIME, I felt that the right thing to do in MIME was to
>try to move to some sort of universal character set that could represent all of
>the world's characters. Lots of other people felt this way as well, and some
>felt that either ISO 10646 or Unicode (they were completely different critters
>back then) was the way to go. Other people felt, however, that neither of these
>character sets were adequate. There was a huge battle and no consensus was ever
>reached. This is why the MIME specification now says:
> NOTE: Beyond US-ASCII, an enormous proliferation of character sets is
> possible. It is the opinion of the IETF working group that a large number of
> character sets is NOT a good thing. We would prefer to specify a SINGLE
> character set that can be used universally for representing all of the world's
> languages in Internet mail. Unfortunately, existing practice in several
> communities seems to point to the continued use of multiple character sets in
> the near future. For this reason, we define names for a small number of
> character sets for which a strong constituent base exists.

At the time being (with Unicode and ISO 10646 still different), and
for the pourpose of MIME (e.g. serving as a kind of "identifier" for
existing raw text), I guess the note above was probably adequate.
But we are at another time, in another place.

>In other words, this is still an open issue. Some people believe that all
>character sets are, or can be made to be, subsets of ISO 10646. And others do
>not. And I don't see any chance of this changing any time soon.

The claims restated above, that the same character, in a Japanese and
in a Chinese version, be something fundamentally different, is farfetched,
or to say it more clearly, totally worthless. I can show you thousands
of glyphs where none of the proponents of this idea will be able to tell you
what side it belongs, because it belongs to both.

>Exactly right. The only problem I see here is the notion that the charset
>has to be a subset of ISO 10646. This, as far as I can tell, is a relatively
>new notion and, I think, a very dangerous one that is best avoided if at all

As I have said before, as formulated, there is no need for the character
repertoire of the charset (read encoding) of the MIME header to be a
subset of the ISO 10646 repertoire. Transmitting in an encoding that
*could* transmit other characters (but doesn't) is perfectly legitimate.

There remain scripts that have not yet been included in ISO 10646,
such as Ethiopian and historic scripts and a large bunch of rarely
used Han ideogramms, as well as fancy scripts (Klingon?, or whatever
you dreamed up tonight), which may or may not be included in ISO 10646.

For these, I can give various temporary solutions that all will work

Regards, Martin.