Re: ISO/IEC 10646 as Document Character Set

Albert Lunde (
Sat, 29 Apr 95 10:24:47 EDT

>> >I was under the impression we needed this (10646 as the standard document
>> >character set) for 2.0 in some form to resolve the question of numeric
>> >character references, or else we needed to remove some language about other
>> >charsets and make 2.0 talk about Latin-1 only.
>> Again, isn't 2.0 about current practise, which in the area of
>> character sets is: ISO 8859-1 is all we define behaviour for.
>That's the way I see it. The 2.0 spec describes a set of features
>that are widely deployed. Unicode is not widely deployed in web
> browsers (and certainly wasn't in June '94...)
>The internationalization issues deserve their own document, not
>a "quick sneak" into the 2.0 document.

I'm not objecting to putting internationalization in 2.x.

I can live with this either way... we may need to be careful, though
with the paragraphs previously put into the 2.0 draft to suggest
an interpretation of support for other MIME charsets, as I think
they caused some comment.

If the 2.0 document will say that the document character set is
ISO 8859-1 (regardless of MIME charset), and thus numeric character
references are interpreted with respect to ISO 8859-1, this
position is upward compatiple with a future 2.x internationalization
document saying the document character set is Unicode (regardless
of MIME charset)

The 2.0 document could also be silent on other MIME charsets.
I think the numberic character references issue was the
main problem that got in the way of the sort of minimal reference
to them we were making in the 2.0 draft.

The problem area came from trying to infer an SGML declaration
from the MIME charset parameter (I think) as well as the numeric
references problem.

Specifying a fixed document character set, whether ISO 8859-1
or Unicode, solves the numeric character references problem.

But if the document character set is ISO 8859-1, I'm not sure
what the SGML interpretation is for a document in a different
MIME charset, containing non-ISO-8859-1 characters.

This may be an argument for being silent on other MIME charsets
or narrowing the language somehow, in the 2.0 document.

The other idea floated in the past for 2.0, of inferring an SGML
declaration using the MIME charset as the document character set,
would produce an interpretation (I think) of numeric character
references not consistent with a later redefinition in 2.x
of the document character set as Unicode, so we want (I think)
to make sure we don't suggest that in 2.0 (even if it is
current practice somewhere!).

    Albert Lunde