I'm not sure I agree with that. If it means "by default", it's OK,
but otherwise it excludes anything but ISO-8859-1, which is
unacceptable.
>3.1 text/html media type
>
> Charset
> The charset parameter (as defined in section 7.1.1 of RFC
> 1521 [4]) may be given to specify the encoding used to represent
> the HTML document as a sequence of octets. The default value is
> out of scope of this specification; but for example, it is
> US-ASCII in the context of MIME mail, and ISO-8850-1 in the
> context of HTTP.
Is it a good thing to have different defaults depending on the mode of
transmission? What if I store an HTML doc. on disk and forget how I
got it? I think the default for HTML has been ISO-8859-1 since the
beginning, and that the spec should simply say so. It can be
MIME-encoded in mail if necessary.
>3.2 HTML Document Represenation
>
> A MIME entity with a content type of "text/html" represents an HTML
> document, consisting of a single text entity. The charset parameter
> (whether implicit or explicit) identifies a character encoding. The
> text entity consists of the characters determined by this character
> encoding and the octets of the body of the MIME entity.
>
> The SGML declaration of the document is a function of the charset
> parameter. If the charset parameter is US-ASCII or ISO-8859-1, the
> SGML declaration in section 13@@ applies. Other charset parameter
^^^^^^^^^^^^^^^^^^^^^^^
> values are reserved for future use.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I can't believe we're coming back to such language. It's too late to
reserve other charset values: they're in wide use already, and have
been for a while.
This sentence needs to go, and be replaced by the former language that
said that for other charsets, the SGML was to be minimally modified.
> NOTE: A generalized convention for mapping charset parameter values
> to SGML declarations is expected to be specified in a future
> version of this specification.
Good.
BTW, there are two subsections numbered 3.2.1
>6.1 The ISO Latin 1 Character Repertiore
>
> Conforming HTML user agents are required to support the US-ASCII
> [10] or ISO-8859-1 [11] character encodings, and the @@fullname ISO
> Latin 1 document character set.
Are *minimally* conformant UAs required to support Latin-1, or just
any single charset?
Perhaps charset requirements should be spelled out in section 1.3
(Terminology) for "conforming HTML user agent" and "minimally
conformant...". Surely we don't want a conforming UA to be forced to
support all charsets.
>12.3 SGML Declaration for HTML
>
> This is the SGML Declaration for HyperText Markup Language (HTML)
> as used by the World Wide Web (WWW) application:
..for documents encoded in ISO-8859-1. Documents encoded in other
character sets should use an SGML declaration as close as possible to
this one, in order to preserve SGML conformance.
-- François Yergeau <yergeau@alis.ca>