Re: ISO/IEC 10646 as Document Character Set

Glenn Adams (glenn@stonehand.com)
Thu, 4 May 95 18:58:58 EDT

Date: Thu, 4 May 1995 18:14:27 +0500
From: connolly@w3.org (Dan Connolly)

OK... now I've got exact changes I can make to the document.
The real question is: what does this mean to information providers?
Does it solve any of their problems?

By itself, this change will permit me to represent and interchange any
document which could be expressed using the 34,168 characters defined by
10646 while, at the same time, employing ASCII as the document encoding.
How? By specifying every character in 10646 outside of ASCII by means
of a numeric character reference.

Now obviously this isn't an optimum state of affairs; however, it will
be well defined and produce conformant documents that can be interchanged
using not only 8-bit clean HTTP but also 7-bit SMTP for that matter. Look
at it as a poor-man's transformation method for 10646.

For example: what good is ISO10646 without support for UCS-2 or UTF-8
(or even ISO-2022-JP)?

You might also ask what good is it that Microsoft says the support Unicode
in Windows NT, yet can't input or display most languages supported by Unicode
let alone inputting and displaying multi-lingual documents. The answer is
that Unicode/10646 are a foundation. And, we all know you can't build a
house without a foundation.

The reason I'm pushing hard for this change is that it is *very* important
to establish this foundation before we get lost in the details of how to
hang the curtains. We don't want to have to build another Winchester
Mystery House before the Web becomes globalized do we?

If getting ISO10646 in there is just a political move ...

I don't view it as a political move at all. I view it as a necessary
step in making the World-Wide Web world-wide. If there is a better
foundation to lay to accomplish this, then I'd sure like to hear about
it. I don't think there is.

I don't see how putting half the solution -- ISO10646 as a document
character set, with no deployed support and no specification for
support of other encodings -- in the 2.0 document is better than
leaving 2.0 as is and providing a complete specification in another
document.

It's a matter of laying the foundation. I can tell you now that "a
complete specification" for I18N of the Web is a good ways off. I18N
has a lot of quicksand running through its veins in which many a good
engineer has been lost. Decoupling the change to 10646 as docset from
I18N will permit progress by innovative implementors to create the solutions
people are looking for.

Imagine, after all, where the Web would be without the foundation of the
Internet. The apriori existence of the Internet provided the foundation
on which the Web was built. 10646 will provide that same foundation for
a truly globalized Web. It is *very* important to lay that foundation now!

Regards,
Glenn Adams