Re: Putting the "World" back in WWW...

Richard L. Goerwitz (goer@midway.uchicago.edu)
Fri, 30 Sep 1994 23:08:18 +0100

>A good solution for the multi-lingual problem would be to use Unicode
>as the character encoding. Issues of left to right and right to left
>text are simply resolved by the Unicode character code (no language or
>locale information needed).

This assumes that Unicode is the be all and end all of coding schemes,
and I don't think that that's a reasonable assumption. 8-bit schemes
will be around a long, long time; and probably 32-bit schemes will be
used as well. There has to be sufficient flexibility to allow for any
scheme - not just the one favored by Microsoft and the Unicode Consor-
tium.

Of course, let me admit to you on the side that it would be ideal to
have an all-encompassing scheme we could all agree on - one that all
computers would use from now until the next millennium. But it ain't
gonna happen. Despite Microsoft's support of Unicode, for instance,
I don't see any intrinsic support for it in Chicago, despite the huge
amount of resources Microsoft has to devote to it. If Microsoft it-
self isn't able to jump on the bandwagon, how can you expect, say,
people in the Soviet Union to do it?

Aside from the practical matters, I might add, there is also the lar-
ger reality that many don't agree with the approach the Unicode Con-
sortium has taken. It would be downright inhospitable to force such
people to conform. Very un-Web-like.

>To go along with this, I would suggest adding a new MIME type -
>"text/uni-html" that indicates the document is using Unicode.

It's already been suggested here - by Dan or Phil or some other in-
cisive mind - that we'll need language and encoding defaults for
every document. If not otherwise specified these should be Eng-
lish and ISO 8859-1 (for backwards compatibility). There should
be no problem specifying Unicode explicitly as the default encoding
scheme, and as you note there should be no problem having clients
that can't handle it reject documents so composed.

The reason we need language defaults is that, as we just hashed out
a few days ago: You can't assume a one-to-one correspondence be-
tween encoding scheme and language. Sure, with Unicode you can for
the most part (what every happened with the "Han unification" issue?)
But not with a lot of other coding schemes.

Richard Goerwitz
goer@midway.uchicago.edu