Re: Charsets: Problem statement/requirements?
Fri, 10 Feb 95 17:36:19 EST

Luke Y. Lu <> writes:
>I think this is better: <lang enc="iso-8859-1">....</lang> and <lang
>enc="iso-whatever">...</lang> etc.

I think this is seriously wrong. Encodings and languages are pretty much
orthogonal, with a single encoding being able to represent several languages and
a single language being representable in a single encoding.

>I think it's not necessary to differentiate
>languages using the same encoding scheme (e.g. french and german).

There are many reasons why this is highly desirable: glyph disambiguation,
translation, hyphenation, indexing...

>usage to to differentiate particular languages is to facilitate automatic

That's only one usage.

>But I think if a translator can't figure out which language
>by looking at the raw bytes of a known encoding scheme, it's pretty much

A very debatable opinion. Automatic translation is already very much wanting,
adding a hard to satisfy requirement will only make it more costly and less
reliable. Should we also add this requirement to every hyphenator, every
indexer, every rendering engine? I don't think so. A language tag is a
language tag is not en encoding tag.

Francois Yergeau  <>
Alis Technologies Inc., Montreal
+1 514 738-9171