Ummm, just to clarify, what I said is that this WG shall not
define a standard for charset (as in character encoding) requirements
of HTML and HTML user agents *unless* that standard is ISO-8859-1.
This (as you know) does not apply to the document BASESET discussion,
for which my only requirement is that HTML documents encoded in
ISO-8859-1 are not made illegal, and HTML systems that only understand
ISO-8859-1 are not made illegal.
I personally would prefer no HTML standard character set at all
(leaving the negotiation up to the transfer protocol), but I understand
now that SGML does require a BASESET character repertoire which defines
the meaning of numeric character references, and we are therefore
constrained somewhat by the SGML standard.
I therefore prefer Albert's wording of
"The document character set has been specified as ISO/IEC 10646 to make the
interpretation of numeric references clear in other character encodings and
as a basis for further internationalization, though previous specifications
of HTML were based on ISO-8859-1. Developers of software supporting other
MIME charsets than ISO-8859-1 must intepret numeric references based on
ISO/IEC 10646 as the document character set.
However, Glenn's quote from the SGML Handbook, p. 487, section
15.6 System Declaration:
"A system declaration is the complement of an SGML declaration.
While an SGML declaration identifies the features that a parser
requires in order to deal with a particular document, the system
declaration identifies the set of SGML declarations that a system
can deal with."
would indicate (to me) that an ordinary BASESET of ISO/IEC 10646
requires that conforming parsers be capable of dealing with characters
greater than 255.
Therefore, I would not mind changing the 2.0 declaration to specify
ISO/IEC 10646 as the BASESET, but only if characters > 255 are
marked as UNUSED.
....Roy T. Fielding Department of ICS, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>