In 2 words, CJK Unification. While I *personally* think that the lack
of glyph disambiguation is not a major problem, many people (with
close to religious fervor) see this as a fatal flaw in Unicode. These
people (and some of them are quite influential) will not accept
Unicode without some way of being able to differentiate languages to
aid in glyph selection.
>I fully agree that language (and font, and style, and ...) tags are useful
>and highly desirable at a high level, and support for this should be added
>to HTML (or at whatever level is appropriate).
This is indeed one way to handle this. However, this requires that
*every* parser (and we should *not* restrict ourselves to HTML) need
support for such markup. Every DTD must have such attributes, or tags
added, and every producer of information must use them. The latter
might be interesting because the linguistic differences could cross
element boundaries. If the producer is converting from SJIS for
example, and we want such high level tags generated automatically,
they will have to parse the document and insert start and end tags as
appropriate.
In a recent message I said that if we use 2 codes and put them into
the STAGC and STAGO classes respectively. One advantage to this is
that one can unambiguously say that these tags are "hints" at any
level within the application, and handle them in an application
specific manner, which might be high level parsing, or removal, or
anythiung else.
>Doing it at a low level adds complexity and complicates clients
>and servers that use Unicode. There has to be a compelling reason to add
>this complexity. There has to be a problem that it solves.
I disagree. If we handle this at the low level, we have a single point
where all this processing is handled. The parser sitting above this is
not complicated by this at all. One *could* complicate the parser if
one so desires with my recent proposal using 2 codes.
>Given that work is in progress to add language information at a higher
>level (HTML 3) it seems to me that there would have to be an
>extraordinarily strong reason to add this information at a low level as
>well.
While this *is* the HTML working group, I am not thinking of *only*
HTML. We have an infinite number of possible DTD's, and *requiring*
high level processing potentially complicates each and every one of
them.
Philisophical debates are so much fun...