Re: ISO charsets; Unicode

Dave Raggett (dsr@hplb.hpl.hp.com)
Tue, 27 Sep 1994 17:07:06 +0100

Richard L. Goerwitz writes:

> Has a formal mechanism been considered for specifying various popular
> coding standards, such as ISO 8859-7, ISO 8859-8, etc., and (perhaps
> off in the future) Unicode?

> Might be possible to use SGML entities for every conceivable character
> in every conceivable language, but as a practical solution to a current
> problem, this seems difficult at best.

There are several ideas relevant to this:

o Introducing a LANG attribute common to most elements for
specifying the language is use in that context, e.g.
<P LANG="en_uk">This is in British English, <EM LANG="fr">
mais celui-ci c'est en francais</EM>

o Using the MIME header to declare which charset is in use
e.g. Unicode.

o Using a mechanism like ISO 2022 to switch charsets dynamically
as used by NTT's multilingual version of Mosaic.

o Introducing a CHARSET attribute to direct the browser that
the next sequence of characters (following the ">") is in
a specified character set

o Fixing SGML to handle multiple character sets properly

The LANG attribute is essential for handling text which reads right
to left rather than left to right. This can be mixed on the same line.
The character set issue needs to be resolved over the next few months.

--
Best wishes,

Dave Raggett

----------------------------------------------------------------------------- Hewlett Packard Laboratories email: dsr@hplb.hpl.hp.com Filton Road tel: +44 272 228046 Stoke Gifford fax: +44 272 228003 Bristol BS12 6QZ United Kingdom