Re: HTML/SGML/charsets

Paul Burchard (burchard@horizon.math.utah.edu)
Mon, 3 Apr 95 08:48:48 EDT

Joe English <joe@trystero.art.com> wonders:
> If MIME agents are allowed to translate message bodies
> from one character set to another

Dan's clever little glossary really is worth reading more carefully
-- it turns out this question does not make sense.

The terminological confusion is that the so-called MIME "charset"
parameter does NOT specify a "character set" (enumerated list of
chars), but instead specifies what would be better termed a
"character encoding" (map from seq of octet to seq of char).

Notice:

* Changing the MIME charset parameter (i.e. character encoding) has
no effect on the "character repertoire" of the document (i.e., the
unordered set of chars represented in the doc), only on the sequence
of octets used to represent it.

* Changing the MIME charset parameter (i.e. character encoding) *MAY*
have an effect on the character set -- but only through a
not-yet-fully-defined mechanism for mapping character encodings to
SGML declarations. There is really no "natural" mapping from
character encodings to character sets (although there is a natural
constraint that the repertoire of the character set should include at
least the repertoire of the encoding).

In other words, the problem you're worrying about could occur -- but
only if (in the future) we were foolishly to postulate a char
encoding -> SGML decl mapping which takes certain char encodings to
char sets which are not extensions of ISO-8859-1. But it's ours to
screw up or not...

[P.S. Hope I've got it correct myself!]

--------------------------------------------------------------------
Paul Burchard <burchard@math.utah.edu>
``I'm still learning how to count backwards from infinity...''
--------------------------------------------------------------------