Re: Character sets

Albert Lunde (
Wed, 8 Feb 95 18:08:29 EST

At 12:00 PM 2/8/95, Daniel W. Connolly wrote:
>Stating that "ERCS (Unicode) is the best answer" doesn't help those of
>us like myself that don't know what ERCS is and barely know what
>Unicode is. I suspect you have seen neither support nor arguments nor
>alternatives because folks don't fully understand your proposal.

I think all these references have been cited on the list before, but I'll
post them anyway.

RFC1641 "Using Unicode with MIME"
RFC1642 "A Mail-Safe Transformation Format of Unicode"

are cited in RFC1700 (Assigned Numbers) as the reference for some Unicode
character set names.

There is a "Unicode Home Page" at:

It has _some_ version of the Unicode standard (I think), a glossary and a FAQ.

The page at:

looks especially interesting as it discusses various encodings of Unicode.

I don't know what ERCS is either. If I was guessing I'd say Extended
"Reference Concrete Syntax" (a la SGML).

And looking back in December I find:

Dec 12 Gavin wrote:

>There is a proposal going to SGML Open from a fellow in Australia that
>might be of interest to you. The proposal outlines an Extended
>Concrete Synatax that defines a 16bit CHARSET.
>The core concept is that at the lowest level in the parser, you have a
>"normalizer" which converts from the data storage format into the
>document character set. This is roughly akin to my proposal, but
>generalises it so that it *should* be possible to mix encodings, and
>character sets, and let the normaliser take care of all the nasty

I'm guessing the prior paper Gavin refers to is:

"A truly multilingual WWW" sent to http-wg and html-wg 12/22/94 and archived at:

Last round there were more follow-ups on the http-wg list than the html-wg list.

    Albert Lunde