Re: Character sets

Gavin Nicol (gtn@ebt.com)
Tue, 7 Feb 95 01:08:17 EST

>I didn't hear a consensus the last time this subject broke out,
>but it seemed that many of the objections raised to Unicode
>as a device for multi-lingual documents were addressed
>by Unicode plus some explict way to indicate changes in language:
>either a tag or some low-level mechanism.

Sure, but no-one stated willingness to commit to Unicode either.

>(I'm not sure which aspect of the SGML declaration you are
>seeing as the problem.)

The simple problem of the character set declaration affecting how
parsing is performed. We can infer a declaration from the MIME data,
but them we must also define character classes for the character set
at the same time.

Unicode gives a superset into which everything else can be mapped, so
doesn't face this problem. Define it once, optimise your parser for
it, and you'll never need to change the parser to support new
character sets.