Re: ISO/IEC 10646 as Document Character Set

Roy T. Fielding (fielding@avron.ICS.UCI.EDU)
Wed, 17 May 95 01:05:43 EDT

>>We could say that the default encoding scheme is base line "ISO-2022".
>>Furthermore, we could say that, by default, the following initial state
>>is to be assumed:
>
> Sasagawa pro da neh...
>
> This is a most inventive answer to the problem, but Roy has stated,
> quite strongly, that only ISO-8859-1 will be standard.

Ummm, just to clarify, what I said is that this WG shall not
define a standard for charset (as in character encoding) requirements
of HTML and HTML user agents *unless* that standard is ISO-8859-1.
This (as you know) does not apply to the document BASESET discussion,
for which my only requirement is that HTML documents encoded in
ISO-8859-1 are not made illegal, and HTML systems that only understand
ISO-8859-1 are not made illegal.

I personally would prefer no HTML standard character set at all
(leaving the negotiation up to the transfer protocol), but I understand
now that SGML does require a BASESET character repertoire which defines
the meaning of numeric character references, and we are therefore
constrained somewhat by the SGML standard.

I therefore prefer Albert's wording of

"The document character set has been specified as ISO/IEC 10646 to make the
interpretation of numeric references clear in other character encodings and
as a basis for further internationalization, though previous specifications
of HTML were based on ISO-8859-1. Developers of software supporting other
MIME charsets than ISO-8859-1 must intepret numeric references based on
ISO/IEC 10646 as the document character set.

However, Glenn's quote from the SGML Handbook, p. 487, section
15.6 System Declaration:

"A system declaration is the complement of an SGML declaration.
While an SGML declaration identifies the features that a parser
requires in order to deal with a particular document, the system
declaration identifies the set of SGML declarations that a system
can deal with."

would indicate (to me) that an ordinary BASESET of ISO/IEC 10646
requires that conforming parsers be capable of dealing with characters
greater than 255.

Therefore, I would not mind changing the 2.0 declaration to specify
ISO/IEC 10646 as the BASESET, but only if characters > 255 are
marked as UNUSED.

....Roy T. Fielding Department of ICS, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>