Re: charset parameter (long)

Gavin Nicol (gtn@ebt.com)
Mon, 16 Jan 95 16:49:13 EST

Albert Lunde writes:

>It's more important to pick a well-defined name space than to have all
>browsers support everything.

Yes, and for charsets we think are needed, but are not within this
namespace, standardisation must be begun.

>I'm not totally convinced that transferring a whole document in a single
>encoding, a la Unicode, is the _only_ way to handle multi-lingual
>documents, though I'm not an SGML expert and could use some discussion on

There is no need for them to be *transferred* as a single character
set, but they must be *converted into* a single character set before
the parser sees the character stream. Why not therefore save the time
and simply transfer it as a single character set in the first place?

>Another possiblity would be to define a meta-encoding for multiple
>character sets, where the escape codes to shift character sets would not be
>represented in _any_ of the character sets. It would then be up to a
>multi-lingual HTML implementer to provide a pre-processor to get this
>information into a form an SGML parser could deal with (maybe by
>normalizing to a combined character set, maybe by adding extra markup)
>This does sound less elegant than Unicode, but I'd like to hear more about
>why it won't work before ruling it out.

Sounds much like ISO-2022... and provided the data is normalised
before the parser ever sees the data, it will work. However, you will
have to convert the data into something the parser understands. What?
What character sets support multiple languages? Then ask yourself: "why
not simply convert to one of those"?

I have never said Unicode must be the *only* character set, only the
*commonly understood* character set....