Re: Charsets: Problem statement/requirements?

yergeau@alis.ca
Fri, 10 Feb 95 17:36:19 EST

Luke Y. Lu <ylu@mail.utexas.edu> writes:
>I think this is better: <lang enc="iso-8859-1">....</lang> and <lang
>enc="iso-whatever">...</lang> etc.

I think this is seriously wrong. Encodings and languages are pretty much
orthogonal, with a single encoding being able to represent several languages and
a single language being representable in a single encoding.

>I think it's not necessary to differentiate
>languages using the same encoding scheme (e.g. french and german).

There are many reasons why this is highly desirable: glyph disambiguation,
translation, hyphenation, indexing...

>One
>usage to to differentiate particular languages is to facilitate automatic
>translation.

That's only one usage.

>But I think if a translator can't figure out which language
>by looking at the raw bytes of a known encoding scheme, it's pretty much
>useless.

A very debatable opinion. Automatic translation is already very much wanting,
adding a hard to satisfy requirement will only make it more costly and less
reliable. Should we also add this requirement to every hyphenator, every
indexer, every rendering engine? I don't think so. A language tag is a
language tag is not en encoding tag.

-- 
Francois Yergeau  <yergeau@alis.ca>
Alis Technologies Inc., Montreal
+1 514 738-9171