Re: Comments on: "Character Set" Considered Harmful

Gavin Nicol (gtn@ebt.com)
Wed, 26 Apr 95 13:09:39 EDT

>There is perhaps a blurred distinction between character set and
>content-encoding. I can imagine some one inventing a novel means
>of encoding characters from a mix of well known and completely
>novel character sets. To handle this the browser would download
>a lexical analyser over the network along the lines of SUN's
>HotJava. This would be combined with downloadable fonts for the
>novel character sets.

I think we discussed this a long time ago. One of the reasons for my
proposal for requiring Unicode support was to not have to force the
client to download a scanner. I think Terry Allen, or perhaps Rick
Jeliffe first mentioned the idea of using downloadable fonts to
overload a character code with more than one glyph image. The font
specification could be part of the encoding.

>To make this concrete, consider a novel mathematical notation or
>perhaps novel notations for music or dance. The SGML parser is
>not effected, as the character encoding is handled by a protocol
>layer below the SGML entity manager.

Sure. I think this is of great practical value, though it feels
hackish.

I would certainly prefer *not* having to download a scanner. Code
conversion tables are palatable, but just barely.

I think I said a long time ago that eventually you'll need 16 bits or
more internally to process multilingual documents. Why complicate the
clients (the most changeable part of the equation) with conversion
when servers can handle it equally as well?

Anyway, I'll push this later (it can wait). For now, I'd like to
discuss the other HTML issues I raise in my papr.