Re: ISO charsets; Unicode

Richard L. Goerwitz (goer@midway.uchicago.edu)
Thu, 29 Sep 1994 02:52:50 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: dkearns{TCNET/HR/dkearns}@klaven.tci.com: "Re: WWW/Mosaic widget"
Previous message: Dave Raggett: "Re: Languages (was Re: Forms support in clients)"
Maybe in reply to: Richard L. Goerwitz: "ISO charsets; Unicode"
Next in thread: Stavros Macrakis: "Re: ISO charsets; Unicode"

>> The basic point is that various coding schemes overlap.
>
>I was assuming that any character in the HTML can be unambiguously
>identified. ...This only needs an encoding-system attribute, not a
>language attribute.

I would not want to enforce a uniform transliteration system based
on encoding scheme only. You have to reckon with the possibility
that you will use one encoding scheme, but different translitera-
tion systems, depending on the language each encoding scheme is
used to render. I certainly would not want to render Persian with
the same system used for classical Arabic, though they are written
with the same basic glyphs.

Let's take an interesting example from the remote past. The basic
Sumerian syllabary (our oldest writing system) was used to repre-
sent Akkadian, Hittite, Urartian, Elamite, and so on. Slight dif-
ferences in sign shapes accumulate over time and across various
locales, but the system is basically the same. Probably the most
practical initial design of a coding scheme for this script would
be to start with a single encoding. but have a language attribute that
would allow us to make adjustments for time and place, and signal
different transliteration schemes if need be.

Having language and encoding scheme separate just buys us a little flexi-
bility.

Richard Goerwitz

Next message: dkearns{TCNET/HR/dkearns}@klaven.tci.com: "Re: WWW/Mosaic widget"
Previous message: Dave Raggett: "Re: Languages (was Re: Forms support in clients)"
Maybe in reply to: Richard L. Goerwitz: "ISO charsets; Unicode"
Next in thread: Stavros Macrakis: "Re: ISO charsets; Unicode"