Re: Charsets: Problem statement/requirements?

Gavin Nicol (
Thu, 9 Feb 95 09:28:25 EST

>Thanks for stirring the pot, Dan.

Do I get any thanks? I've been stirring it longer ;-)

>Any other charset in which the characters in the RCS (used for markup)
>are in that same places as the ISO-8859 charsets could work just
>as outlined above, couldn't it?

Yes, though there are cases where it will be *very* easy to do the
wrong thing. One thing I'll do next week when I send out my revised
paper is to outline methods of deployment and support for current
character encodings.

>However, I agree that as a practical matter Unicode may be a
>reasonable short-term solution, as you outline below. (Gavin,
>here's some public support.)

Thank you! Thank you!

| As an implementor, by now, I'm getting tired of supporting 147
| different fonts and character encodings. I'm starting to believe in

Best leave fonts out of this (well constructed) argument at this point.
The main issue is charsets; distributing fonts is going to be
necessary, but for other reasons too, and it's a side issue here.

>| I'd sure like it if folks would quit sending ISO-2022-JP, big5, and
>| all these crazy encoding and just use Unicode.
>And if they won't? and insist on making browsers that handle them

My figures outline the method to use here, but I'll go into more
detail next week...

>Any practitioners know the answer?

I forget the exact case, but I remember seeing an example somewhere
where it was not possible to infer the direction. It is certainly
possible for the vast majority of cases I think.

>*Especially* with Unicode, you need to know the language. Again,
>the LANG att will do the job.

I think the CJK thing approaches a religious issue: the Unicode people
say you can get by without, and the anti-Unicoders say they'd never be
cause using it without some glyph disambiguation mechanism. I
think in 90% of the cases, you'd get by without it, but I figure that
having it is probably better than not, simply because you'll be able
to satisfy the vocal crtitics as well. I tend to argue for that being
part of an ecoding because I'm thinking of arbitrary SGML here as

>Nope. You have to go back to using the SGML decl supplied by the
>originator (which can be used to construct the relevant MIME

You are quite right, and I missed this.