Re: ISO/IEC 10646 as Document Character Set

Anoosh Hosseini (anoosh@gorgan.mti.sgi.com)
Thu, 4 May 95 22:10:45 EDT

On May 4, 9:21pm, Gavin Nicol wrote:
> Subject: Re: (Fwd) Re: ISO/IEC 10646 as Document Character Set
> >Here is an example, we have support for a ISO-8859-X language
> >using a localized browser, but this browser only runs on platforms
> >A and B. We have external viewers for the language that can be
> >invoked by correct setup in .mailcap .mime.types files on platforms C
> >and D. Now how is a server to distinguish between these cases?
>
> Why is it necessary to? The data can be displayed anyway (though it
> might be nice to distinguish, I cannot see the absolute need).
>

Ah, I had watered down the problem. I have a document in language X
which has multiple possible character encoding represenations. In fact it
may have multiple document format representations such as HTML, pdf ... Those
on platforms A and B have the localized browser, the server detects them and
are sent bilingual HTML, all is well. Users on platform C and D say what is
our sin that our employer does not buy us machines A or B? They really like to
read our home page. Now they may have a third party editor/viewer or some
localized support on their machine and are atleast willing to receive
the document if not in full HTML. So the server would like to know which of
the possible character set encoding and/or file formats the localized external
viewer (if even present) supports in order to send the appropriate
representation. Fair enough?

> >Finally on Point 3. I think 10646 is conceptually nice, however given
> >the fact that we accept different "representations" such as 8859-X
> >being actually sent to the browser via HTTP, then this becomes the "HTML
> >document" that the client sees. Servers may pre-map the 10646 to 8859-X
> >if only one none-Latin1 language is used and so basically the
> >10646 is really not "visible" to the outside world. Thus we are back to
> >HTML markup in US-ASCII and everthing else as data.
>
> Yes, that is the whole point. By using ISO 10646 we get:
>
> 1) A formal foundation for the treatment of multilingual data.
> 2) A unified numeric charset model, even in the face of arbitrary,
> blind, encoding transformations.
> 3) A single SGML declaration.
> 4) A base for a transition to a truly multilingual WWW.
>
> The document character set proposal doesn't buy us a lot at the moment,
> except peace of mind, and a reasonable way to handle numeric character
> references. It is what it enables which is really important...

OK, as long as I dont have to pay $500 for my n-lingual HTML editor
when I only speak one foreign language :-)

-anoosh