>(I'm not sure which aspect of the SGML declaration you are
>seeing as the problem.)
The simple problem of the character set declaration affecting how
parsing is performed. We can infer a declaration from the MIME data,
but them we must also define character classes for the character set
at the same time.
Unicode gives a superset into which everything else can be mapped, so
doesn't face this problem. Define it once, optimise your parser for
it, and you'll never need to change the parser to support new
character sets.