Re: Charsets: Problem statement/requirements?

Anoosh Hosseini (
Thu, 9 Feb 95 11:22:23 EST

Gavin writes:

> >Agreed 100%, this is how the World will really be.
> And so the Balkanisation begins...

:-) my main point was that we should communicate in the most appropriate
encoding, and Unicode is the choice in may cases, but not always.
If we provide the protocol to choose, then all is well.

> I have never proposed Unicode as <emph>the only</> character set, but
> rather as a lingua franca, and a a processing model for the
> parser. The latter does not require Unicode data. In addition, things
> like UTF-8 provide space-efficient encodings for Unicode primarily
> consisting of ASCII.

I agree with your point, and as you say yourself UTF-8 is efficient
for ASCII. It makes sense for a server in country X to communicate in 8 bit
encoding (because 99% of their people are not n-lingual), and
as needed send the Unicode version when multi-lingual text is communicated
or client has requested data in Unicode format. We are talking about
loosing 1/2 the bandwidth in an Unicode only environment. My only concern
is "Unicode" (input) only clients in the future in the same
way we have Latin1 only today.

> >Unicode has introduced direction codes to give hint to the rendering
> I always wondered why they did this, but didn't introduce a few codes
> for CJK use...

> >This is mainly a rendering problem rather than a communication
> >problem which brings up the issue of multi-local (correct term?)
> >versus multi-lingual clients.
> I believe that multi-localised clients are those like Mosaic-l10n, and
> multilingual client offer seamless support for many languages. I like
> the latter...

Yes my point was when the client says to the server

Accept-Charset Unicode Accept-Language x,y,z

Does it mean the client can handle each one
individually by switching fonts etc.. as in Mosaic-l10n, or
it can render all three in one document.