Re: Charsets: Problem statement/requirements?

Gavin Nicol (gtn@ebt.com)
Thu, 9 Feb 95 09:28:38 EST

>Agreed 100%, this is how the World will really be.

And so the Balkanisation begins...

>So what we are talking about is a generic mechanism to negociate what
>is sent down by the server.

Yes. Accept-Charset, Accept-Language, and charset-xxxx

>I think there is an over simplification with Unicode as a solve all.
>Unicode is not a 16 bit ASCII which by providing a 16 bit font we are
>"done". This might be true for many languages (1->1 mapping between
>character encoding and glyph) but not in Persian/Arabic.

Sure, there are rendering issues too.

>Second we may agree to talk Unicode, but that does not require me
>to store 16 bit. For example if I only support Russian in my client,
>I will know that you have sent me Russian Unicode, and will map that
>on the fly to my 8 bit internal representation and use 8 bit indexed
>fonts (dont forget PC's). I as a client should not be forced to
>support anything more that latin1, and anything above that I will
>inform the server.

If you only support Russian, or Latin 1 in you client, don't ask for
Unicode...

I have never proposed Unicode as <emph>the only</> character set, but
rather as a lingua franca, and a a processing model for the
parser. The latter does not require Unicode data. In addition, things
like UTF-8 provide space-efficient encodings for Unicode primarily
consisting of ASCII.

>I would hope that future servers would not send down encodings
>which the client has not indicated support for. Errors should be
>communicated in Latin1 as mentioned before.

I hope all servers and client will be able to understand Unicode so
that in all cases, we can communicate.

>Unicode has introduced direction codes to give hint to the rendering

I always wondered why they did this, but didn't introduce a few codes
for CJK use...

>This is mainly a rendering problem rather than a communication
>problem which brings up the issue of multi-local (correct term?)
>versus multi-lingual clients.

I believe that multi-localised clients are those like Mosaic-l10n, and
multilingual client offer seamless support for many languages. I like
the latter...