Re: Charsets: Problem statement/requirements?

Luke ~{B7?M~} (ylu@ccwf.cc.utexas.edu)
Fri, 10 Feb 95 19:38:49 EST

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Gavin Nicol: "Re: SGML/MIME charsets, ERCS/Unicode [was: New DTD (final version?) ]"
Previous message: yergeau@alis.ca: "Re: Charsets: Problem statement/requirements?"

On Fri, 10 Feb 1995 yergeau@alis.ca wrote:

>Luke Y. Lu <ylu@mail.utexas.edu> writes:
>>I think this is better: <lang enc="iso-8859-1">....</lang> and <lang
>>enc="iso-whatever">...</lang> etc.
>
>I think this is seriously wrong. Encodings and languages are pretty much
>orthogonal, with a single encoding being able to represent several
>languages and a single language being representable in a single encoding.

Well, nothing prevent you from adding <lang lc="en" enc="whatever"> to give
hints to dumb and dumber language robots. My point was just that using
escape sequences to support mulitple charsets in a modular fashion is
better than trying to use one single charset to cover totally different
languages. See my other posts for details. Indeed, ISO 2022 is not pretty
but nor is Unicode. This is highly debatable, but rush to Unicode is not
a wise/logical/cost effective path.

>>I think it's not necessary to differentiate
>>languages using the same encoding scheme (e.g. french and german).
>
>There are many reasons why this is highly desirable: glyph disambiguation,
>translation, hyphenation, indexing...
>
>>One
>>usage to to differentiate particular languages is to facilitate automatic
>>translation.
>
>That's only one usage.
>
>>But I think if a translator can't figure out which language
>>by looking at the raw bytes of a known encoding scheme, it's pretty much
>>useless.
>
>A very debatable opinion. Automatic translation is already very much wanting,
>adding a hard to satisfy requirement will only make it more costly and less
>reliable. Should we also add this requirement to every hyphenator, every
>indexer, every rendering engine? I don't think so. A language tag is a
>language tag is not en encoding tag.

I agree. But a language tag is useless if don't know it's encoding scheme,
while you can figure out the language if you know the encoding scheme and
_understand_ the language. Understanding is a prerequisite for
translation, IMHO. Anyway, it's not very relavent to my major point.

__Luke

--
Luke Y. Lu
mailto:ylu@mail.utexas.edu/
http://www.utexas.edu/~lyl/

Next message: Gavin Nicol: "Re: SGML/MIME charsets, ERCS/Unicode [was: New DTD (final version?) ]"
Previous message: yergeau@alis.ca: "Re: Charsets: Problem statement/requirements?"