Re: A truly multilingual WWW

Gavin Nicol (gtn@ebt.com)
Tue, 17 Jan 95 18:52:42 EST

[ Apologies if this is seen more than once. ]

>This feature is probably necessary to support existing practice, but I
>worry it will lead to "Balkanization" of the WWW, namely clients and
>servers that don't speak to each other. Of course, even with Unicode, if
>your server is serving up Thai, and I don't have any Thai fonts, I'm out of
>luck. It would be nice, though, if that didn't happen just because you and
>I use different character sets.

This has already happened to a degree: information from Japan seldom
leaks out in anything other than English, and that is but the tip of a
growing iceburg. Unicode provides a good common ground.

>Section 3.2.4
>This is probably the section I have the most problem with. Unicode
>specifically was designed with the idea that attributes such as language,
>fonts, etc. would be encoded out-of-band, via high level tags or even out

Yes, and I am very hesitant to suggest using any special purpose
codes. The problem is that there does need to be some standard (low
level) way of saying that some text is in Japanese, and some text is
in Chinese. Now, the real debate is how to represent this, and I think
the recent idea I proposed is not bad.

We *could* say that these codes are simply an artifact of the encoding
(ie. this is UCS-2-HINTED encoding), and say that they *should* be
removed from the data stream once it's received... or we can say they
are STAGO and STAGC for a high-level tag.

Do you have any complaints about including hints in a transfer
encoding?