Re: HTML/SGML/charsets

Joe English (joe@trystero.art.com)
Fri, 31 Mar 95 16:42:42 EST

eric@spyglass.com (Eric W. Sink) wrote:

> I *think* the perceived technical conflict is:
>
> What if you receive a document which has a BASESET specification of o
> character set, and also:
>
> text/html; charset=somethingelse?

That would be like sending a GIF file labeled as
Content-Type: image/jpeg, because "The SGML declaration ...
is a function of the charset parameter".

The 02 draft contradicted ISO 8879 in section 6.3.2:

The character octet references are not dependent on the character
set encoding of the document. For example, "×" always
represents the ISO-8859-1 multiply sign, even when the document's
declared character set is other than ISO-8859-1.

This has been changed in the 03 draft to comply with SGML --
numeric character references *are* dependent on the
document character set.

If MIME agents are allowed to translate message bodies from
one character set to another (are they? I don't know),
then this may cause a problem, since all numeric character references
would have to be translated as well, and MIME does not know about
numeric character references.

If MIME agents are *not* allowed to translate between character
sets, then there is no problem.

(As far as I can tell, MIME agents are not allowed to translate message
bodies to a different character set, and there is no conflict.)

--Joe English

joe@trystero.art.com