ISO charsets; Unicode

Richard L. Goerwitz (
Mon, 26 Sep 1994 20:45:53 +0100

>> The project head would be happy to plug it into the Web,
>> but again the Web only knows ASCII.
>Not so, the Web knows only ISO 8859-1 (so if you send it ASCII
>it will work) but that is not the same thing.

Thanks for the correction.

Still, is the general problem of multi-language text worth dis-
cussing? For my part, I'd love to make a few of my non-English
databases available online, but I don't know how to tell query
forms to expect something other than ISO 8859-1.

Let me just toss off a suggestion here. Say we suddenly move
from English to Greek text:

<language Greek encoding="ISO 8859-8">

Here's another exigency that must be handled. Arabic, Hebrew, etc.
run right-left:

<language Arabic encoding="ISO 8859-7" wrap="right-left">

The reason the "wrap" must be specified is that it is possible to
do, say, Arabic in one of two ways. The first is to just code in
the stuff backwards. The other is to have the first letter in the
sequence come first in the file, and so on. Both methods are used,
and as I recall the MIME standard (which I read some time ago) al-
lows for both of these methods.

The question for me is just how sophisticated we want clients to
get. The Web is supposed to be worldwide, to be sure, and this
would seem to imply multilinguality. But how are we supposed to
be sure that all of the requisite fonts, with all of the requisite
registries and encodings, are on every machine? Let us assume that
we have a client that has run into some text using an encoding sys-
tem for which there is no appropriate local font. It might be nice
to have a mechanism that:

1) tells the server what type of client is connected, and requests
the correct font
2) gracefully recovers on sections of text for which
a) the server doesn't have the right font for that client, or
b) the client can't grok the display parameters (e.g. it can't
do up-down scripts in the same document as left-right ones,
or can't do up-down at all)

It would be wrong to assume that everyone using the Web will use the
same language all the time, or only want to view text in one language
at a time (consider the common case of an English-Arabic or Japanese-
English dictionary). Yet it would be equally wrong to expect every
client on every micro to handle every possible case, and to do so all
at once.

I'm sorry if I seem to be obtruding in a forum without knowing what I
am doing. As I noted above, I'm in the Humanities, and am simply try-
ing to see if I can be any help at all....

Feedback would be much appreciated.