Re: HTML/SGML/charsets

Bert Bos (bert@let.rug.nl)
Mon, 3 Apr 95 08:53:51 EDT

The issues seem to have been solved quite quickly, I only found the
traces when I came back to work on Monday morning. But please tell me
if I understand the results correctly:

1. The MIME type text/html refers to a format that is an application
of SGML *as it was in 1986* (if SGML changes, the MIME type doesn't
automatically change with it).

2. The MIME charset parameter must be the same as the charset of the
SGML declaration, results are undefined otherwise.

3. The meaning (and thus parsing) of the contents is fully governed by
SGML (1986). The fact that it is a MIME text/* type doesn't
sanction any changes to the body, not even with regard to CR/LF.

(3) Implies that SGML numeric references (&#nnn;) encode an octet that
is mapped to a character by the SGML declaration, not by the MIME
charset (but in practice, they ought to be the same.)

Ad 2: There is another possible interpretation: view the SGML
declaration as instructions for parsing, and the MIME charset as
instructions for display. In other words: an application sees octets
as delimiters or data according to the SGML declaration, but if it
generates any visible output, it has to use the MIME charset to map
octets to glyphs.

I find this interpretation confusing enough to explicitly disallow it.

Bert

-- 
                          Bert Bos                      Alfa-informatica
                 <bert@let.rug.nl>           Rijksuniversiteit Groningen
    <http://www.let.rug.nl/~bert/>     Postbus 716, NL-9700 AS GRONINGEN