Re: ISO/IEC 10646 as Document Character Set

Dan Connolly (connolly@w3.org)
Thu, 4 May 95 13:10:25 EDT

Glenn Adams writes:
>
> Date: Thu, 4 May 95 11:00:34 EDT
> From: connolly@w3.org (Dan Connolly)
>
> I am against putting it [10646 as doc charset] in the 2.0 document at
> this point.
>
> Would it be possible for you to put out an electronic vote on this?

It's possible, but I don't think it's valuable. Eric? I think
we need the chair in on this one.

> If the
> majority of concerned parties have no objection to making this change, then
> I think we should make the change now.

This is not a democracy :-) We go by technical arguments, not by
shouting. I don't see why we need to put ISO10646 as the document
character set in HTML 2.0. Everybody can do everything they need to
do -- and reliably -- even if the 2.0 RFC only specifies ISO-8859-1

The current draft allows user agents to support others, and even
suggest that they support ISO10646:

http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_2.html#SEC8
|HTML Lexical Syntax
|
| ... A minimally conforming HTML user agent must support the SGML
| declaration in section SGML Declaration for HTML, which specifies ISO
| Latin 1 (@@full name) as the document character set; it may support
| other SGML declarations, in particular, SGML declarations with other
| document character sets.

http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_3.html#SEC17

|HTML Document Representation
|
| HTML user agents must support the ISO-8859-1 character encoding
| scheme, and hence the US-ASCII character encoding scheme. (9)

http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_foot.html#FOOT9
|(9)
|
|HTML user agents are encouraged to support ISO10646 as a document
|character set, and Unicode-1-1-UTF-8 and Unicode-1-1-UCS-2 as
|character encoding schemes. Other encodings schemes such as
|ISO-2022-JP may be supported as well.

What do we gain by putting ISO10646 in there? I think we lose: folks
may expect browsers to support all of ISO10646 if it's in the spec.
That would not improve consumer confidence. Putting ISO10646 in the
spec without discussing conformance is a losing proposition. ISO10646
carries a lot of issues, in fact: fonts, encodings, and all sorts of
stuff that's new to many parts of the community. With ISO-8859-1, the
X window system charted the waters to some extent.

> [I already gave you the minimum
> changes needed in the SGML declaration.]

OK. Quick: install those sgmls patches all over the world so I don't
have to answer the mail about "why doesn't sgmls work any more? My
documents used to validate, and with the new DTD, they're broken."
Deploying technology takes time.

HTML 2.0 is a well known quantity in a fairly large community. We do a
disservice by changing it at this point. We just need to clean up
the wording and publish it.

What we _can_ do quickly is issue an internet draft about how to use
Unicode to support multilingual applications on the web. Gavin's paper
is a good start.

Dan