Re: Revised language on: ISO/IEC 10646 as Document Character Set

Martin J Duerst (
Thu, 11 May 95 17:03:24 EDT

Ned Freed wrote:

>> Actually, it's not that *I* want to use chars not in 10646. My concern
>> is that the HTML spec should not attempt to restrict people from using
>> charsets that *they think* (this is key) are "richer" than 10646.
>> What's the point of restricting the charset to subsets of 10646?
>I agree that this is a key issue. You've already lost the battle if you let the
>question of whether or not character sets exist that are "richer" than 10646
>even get asked. The MIME work provided ample evidence that this is a highly
>political question, so much so that different groups will give different
>answers and nothing will ever persuade them to change their position. (Note
>that I have intentionally not said what my position on this is!)

What has to be "rich" is HTML overall. And it is already "rich", and will
become "richer" over time. What we have to care about is the functionalities
the users get. How (s)he gets it has to be based on technical considerations.

>> MIME is in many ways just a framework. One of the WG's decisions was
>> not to restrict the charset -- instead, people would be allowed to
>> register charsets. I think HMTL should similarly avoid restricting
>> the charset.

MIME is in many ways just a framework. HTML is not!
As we want to express HTML in SGML, we need a document character set.
This was/is ISO Latin-1, even if there there existed many cases where
it was "misused", e.g. in Japan.
This can only be ISO 10646 in a more international perspective. There
is nothing else that could come close with respect to:
- Being a collection of characters, with unique numbers assigned.
(All these charset/encoding things don't qualify, as for each
of them, you first have to agree on how the numbers are
- Covering a widest range of scripts/languages/characters.
- Being compactly and comprehensively documented.
- Being worked on for future extensions by the leading specialists.
This doesn't say that when ISO 10646 will be used as the document
character set, there will not be again some deviations and local conventions,
e.g. by using characters in the private zone. However, these deviations
will be extremely small in number, and they will be by magnitudes less
than with any other solution.

>I completely agree. Also note that the character set registration procedure has
>(finally) been formalized to the point where the process may actually work
>pretty well -- I just posted a new version of MIME part four (registration
>procedures) to Internet Drafts that covers all of this.

For MIME and email, it is not so bad if everybody can register his
almost-private character set that he uses to talk to his friends.
[Point aside: I guess the better these procedures work, the more
registrations you get, which is not what you wanted in the first place.]

For HTML and WWW, mutual exchangability with virtually any place
in the world is more important than exchange in a private community,
and so registration is not a good idea. Guess I wanted to "implement",
in some way, all the registered MIME "charset"s. It would cost me
ten thousand $ or more just to get all the documents such as
national standards, which I couldn't even read in most cases.

That we have to speak about the MIME "charset" parameter at all
in our work is due to the fact that, as MIME is used to indicate
document types in the HTTP protocol, it was thought to be
a good idea to use the "charset" parameter to move from a
state of "mutually agreed misuse" of HTML declarations
to something serious again, and do this move in a controlled
way without hurting anybody. Otherwise, the "charset" parameter
is not of great importance.

Regrads, Martin.
---- Martin J. Du"rst ' , . p y f g c R l / =
Institut fu"r Informatik a o e U i D h T n S -
der Universita"t Zu"rich ; q j k x b m w v z
Winterthurerstrasse 190 (the Dvorak keyboard)
CH-8057 Zu"rich-Irchel Tel: +41 1 257 43 16
S w i t z e r l a n d Fax: +41 1 363 00 35 Email: