Re: Revised language on: ISO/IEC 10646 as Document Character Set

Albert Lunde (Albert-Lunde@nwu.edu)
Tue, 9 May 95 22:19:55 EDT

> > Actually, it's not that *I* want to use chars not in 10646. My concern
> > is that the HTML spec should not attempt to restrict people from using
> > charsets that *they think* (this is key) are "richer" than 10646.
> > What's the point of restricting the charset to subsets of 10646?
>
> I agree that this is a key issue. You've already lost the battle if you let the
> question of whether or not character sets exist that are "richer" than 10646
> even get asked. The MIME work provided ample evidence that this is a highly
> political question, so much so that different groups will give different
> answers and nothing will ever persuade them to change their position. (Note
> that I have intentionally not said what my position on this is!)

This question did get raised (in some form) a month or two back,
and died down again with little effect. Discussion of Unicode has
been happening in fits and starts for months, but nobody
has come close to making a proposal that is _more_ comprehensive.

(I'd also suggest that the technical issues differ from MIME,
because we need a scheme that will work well with SGML.)

The most serious objection raised (how to render the asian languages)
was addressed by proposing markup for languages (coming in HTML 3.0
at least).

The idea of using Unicode as the document character set was
motivated in part by fixing the numeric references issue, but now that
I understand it (sort of) I think it will clean up a bunch
of loose ends in SGML and improve interoperability between
various encodings.

Our problem is not how to encompass that largest possible writing
system imaginable, our problem is how to write a standard that
goes beyond ISO Latin-1 and ISO-8859-X. I think the proposed
direction of using Unicode as the document character set
combined with a wide choice of MIME encodings does this
well, and increases the scope of possible characters from
255 to over 30 thousand. If this turns out not to be sufficent,
I'm sure we can do an extension mechanism or format negotiation
to allow for use of a different document character set.

But I don't see why hypothethical objections in the absence
of a concrete counter-proposal should stand in the way of the
large improvement in internationalization we get by adopting
ISO 10646 as a document chacacter set, at least for the
"next step".

I also wonder a little that more hasn't been said sooner: we
got to the present proposal out of repeated discussions over
several months. Any objections that would apply to ISO/IEC 10646
as being too restrictive (rather than too general) apply also
to the use of ERCS, which was floated by Gavin as far back
as Dec 94.

-- 
    Albert Lunde                      Albert-Lunde@nwu.edu