Language hints in UNICODE private use area

pandries@alis.ca
Thu, 19 Jan 95 15:14:21 EST


David Goldsmith recently commented :

>Section 3.2.4
>This is probably the section I have the most problem with. Unicode
>specifically was designed with the idea that attributes such as
>language, fonts, etc. would be encoded out-of-band, via high level tags
>or even out of the character stream entirely.

I tend to wholeheartly agreed with David's thought. I have great
reservations with the idea of using the UNICODE private use area for
encoding language hints. I have basically four reasons :

1) As David mentionned Unicode was explicitly designed not to address
this issue.

2) How do you generalise this idea with encodings where there is no
bytes left for language hinting ? I can write French, Dutch, English
and German, for instance, using ISO-8859-1 : do I have to use Unicode
even in a purely European setting so that I can tag texts ? What about
the fact that today the text base available is mainly in ISO-latin-1 ?

3) It is easy to add, in upward compatible fashion, a tag called, for
example, <lang=...>. Browsers that do not understand the tag will
simply ignore it.

4) I have the impression that this may not be the proper forum
(html-wg, http-wg) to discuss changes of interpretation of Unicode
characters or codes. I am not convinced that these changes will easily
be accepted by the Unicode consortium. It might be much easier to
create an html tag for this purpose.




Patrick Andries
Alis Technolgies Inc.
1+514+738 91 71