Re: Frames & WWW

P. M. Hallam-Baker (
Thu, 17 Nov 1994 12:46:19 +0100

In article <> you write:
|> >> Until UNICODE based programming languages become comonplace I doubt that
|> there
|> >> will be much use for the UNICODE variants since most other text needing fancy
|> >> fonts will have other formatting (eg HTML).
|> >
|> >Urf.
|> I agree :-) I live in Japan. The Japanese still have some problems
|> with Unicode, but I think almost everyone here realises that it offers
|> 99% of what most people want in document portability. As such, I would
|> dearly love to see the character encoding in text/html and text/sgml
|> to be either UTF-8 or UTF-7. As it is now, writing a Japanese WWW
|> browser is not trivial.

Actually if you look back I think you will find that I have been one of the principal
proponents of using UNICODE for exactly the reasons cited. In fact I have produced
the code for UNICODE support and hope to be incorporating it into a browser soon.

The point I was making is that IMHO text/plain is not a high value markup even with
UNICODE and that given a UTF encoding it is reasonable to encode it as octet adressable
rather than by character cell. Actually on reflection I don't like that idea anymore,
it means that the position of the anchor would change depending on whether the file
was sent in UTF or unpacked format which I don't like at all :-)

I doubt that the Japaneese would want to use UTF encoding in any case the principal
reason for supporting UTF is that it is very efficient for Western European text and
OK for eastern european, or at least as efficient as a scheme not using context switching
wich must be avoided like the plague since the files then become linear braindamage
which cannot be interpreted except by starting from the beginning. If people are
prepared to forgo character index adressing then Huffman coding provides much
better compression than merely switching character sets.

Actually one point to be made clear. A UNICODE Web document should be interpreted in
the context of the Content-Language specified. Thus if the language is Japaneese the
Han characters should be displayed as Japaneese, not US inspired mid pacific bit-savers.

Phill Hallam-Baker