And so the Balkanisation begins...
>So what we are talking about is a generic mechanism to negociate what
>is sent down by the server.
Yes. Accept-Charset, Accept-Language, and charset-xxxx
>I think there is an over simplification with Unicode as a solve all.
>Unicode is not a 16 bit ASCII which by providing a 16 bit font we are
>"done". This might be true for many languages (1->1 mapping between
>character encoding and glyph) but not in Persian/Arabic.
Sure, there are rendering issues too.
>Second we may agree to talk Unicode, but that does not require me
>to store 16 bit. For example if I only support Russian in my client,
>I will know that you have sent me Russian Unicode, and will map that
>on the fly to my 8 bit internal representation and use 8 bit indexed
>fonts (dont forget PC's). I as a client should not be forced to
>support anything more that latin1, and anything above that I will
>inform the server.
If you only support Russian, or Latin 1 in you client, don't ask for
Unicode...
I have never proposed Unicode as <emph>the only</> character set, but
rather as a lingua franca, and a a processing model for the
parser. The latter does not require Unicode data. In addition, things
like UTF-8 provide space-efficient encodings for Unicode primarily
consisting of ASCII.
>I would hope that future servers would not send down encodings
>which the client has not indicated support for. Errors should be
>communicated in Latin1 as mentioned before.
I hope all servers and client will be able to understand Unicode so
that in all cases, we can communicate.
>Unicode has introduced direction codes to give hint to the rendering
I always wondered why they did this, but didn't introduce a few codes
for CJK use...
>This is mainly a rendering problem rather than a communication
>problem which brings up the issue of multi-local (correct term?)
>versus multi-lingual clients.
I believe that multi-localised clients are those like Mosaic-l10n, and
multilingual client offer seamless support for many languages. I like
the latter...