Re: charset parameter

Gavin Nicol (gtn@ebt.com)
Tue, 17 Jan 95 19:39:07 EST

>Directionality is generally clear from context, but I've
>seen arguments (sorry, I've lost the references) that point
>out cases where this is not always true.

I'd be interested to hear the cases if you have time to dig them out.

>> ISO-2022 is not what we want though. It is a bandaid on
>> a festering wound.
>
>However, ISO-2020 is used within many non-Latin-1
>markets, and there are multilingual editors
>which operate on an ISO-2022 stream.

Yes, like Mule for example.

>In Asian countries, editors are more likely to operate
>on a multibyte encoding like Shift-JIS. However,
>on all X-based systems, there is built-in support
>to convert this encoding to Compound Text, which is
>used in X for text interchange. So using ISO-2022
>might make a lot of sense.

I have never said that ISO-2022 should not be allowed, only that
Unicode should be the default. Have a look at
<URL:ftp://ftp.stonhand.com/pub.icode.txt> for some reasons against
ISO-2022. My basic problems are that it requires that one be able to
parse multiple encodings, and that is is stateful.

>Especially since Unicode has many detractors in the
>Asian market (a discussion which has already been
>covered on too many other mailing lists already).

Well, as JIS is going the Unicode way, and from my discussions with
people in Japan (where I live), Thailand, China, and other countries,
it seems to me that most will actually accept Unicode, and that most
detractors could be pacified with a "hinting" mechanism.

Then I should mention that one way or another, you *must* convert to a
single character set for the SGML parser at the heart of every HTML
parser. It seems illogical to send things in some format that could
complicate clients unecessarily, and which must be converted
anyway. Why not just send it in the form the parser understands?