Re: HTML-WG digest 105: Unicode

Martin J Duerst (mduerst@ifi.unizh.ch)
Wed, 12 Jul 95 13:14:00 EDT

>We (Accent Software Internation) will be coming out with
>proposals for an interim (Unicode UTF7 which is encoded in
>Latin-1 characters) and additions to HTML-3 (also supportive to UTF-8
>and maybe UCS-2 Unicode).
>
>Please contact me with any ideas.
>
>David Baron

I am pleased to see that a company is working on an Unicode-based
browser. I have difficulties to understand some of the post, because:

a) Is it really a browser, or a proposal in text form (for what exactly)?
b) We are requested to send in our ideas. But we don't know exactly
what the thing is about.
c) Some technical problems are immediately obvious.

First, how to deal with character encodings is specified in the http
internet-draft, and is not directly part of HTML or the work of this
group. As far as HTML is concerned, the wording in the current
draft is very appropriate.
The http document does not, in my oppinion, need any
changes or additions, it just needs to be fully implemented and
reasonably used (by reasonably, I mean trying to restrict character
encodings that go over the wire as much as possible).
Additions to HTML-3 are what should be discussed here, but I cannot
see their relations to character encodings.

Second, the choice of character encodings and the wording doesn't
show, on first impression, much familliarity with the matter.
UTF-7, although an official standard, is kind of a dead-born
child. Compared with UTF-8, it has many well-known drawbacks.
And what is meant by "UTF-7 encoded in Latin-1"? You can encode
a set of characters (Latin-1) with an encoding method (UTF-7),
but not vice versa. Or should "Latin-1" simply read 8-bit bytes?
That would also not make sense. Or should it be the other way round:
"Latin-1 encoded in UTF-7"? No use to encode Latin-1 in the complex
UTF-7 if it is the default anyway.

Hope this helps. Let's get ISO-10646/Unicode and I18N ahead,
but in a well-thought and well-explained manner.

Regards, Martin.