Re: Revised language on: ISO/IEC 10646 as Document Character Set

lee@sq.com
Sun, 7 May 95 14:23:18 EDT

Dan wrote:

> |To support non-western writing systems, a larger character repertoire
> |will be specified in a future version of HTML. The document character
> |set will be ISO10646, or some subset that agrees with ISO10646; in
> |particular, all numeric character references must use code positions
> |assigned by ISO10646.

I think this is very good & very sensible and support it.

I also agree with Dan when he said that we don't have much operational
experience with using 10646 over HTTP, whether in HTML or otherwise!

It seems to me that
(1) we are trying to reflect current practice, and also to provide some
guidance for the near future, so the wording here (and elsewhere), as others
have said, is appropriate.

(2) for something to go into an IETF RFC, there must be
[1] multiple independent working interoperating implementations
[2] rough consensus on the mailing list

I believe that we have achieved rough consensus that ISO 10646 is the
way to go. We are lucky that this list is much less political than some
of the other MIME-related lists in that respect.

To satisfy item [1], however, we need
(a) working 10646-based browsers
(b) http servers that can deal with this, even if that means nothing more
than simply sending out binary data with whatever content-type and
encoding marking that the browsers need
(c) the ability to create and edit 10646-based documents
(d) interoperability between the browsers and servers
(e) interoperability between documents and browsers
(f) the ability for the 10646 tools to handle existing 8859-1 documents
(g) all of the above on multiple platforms

Is that a reasonable summary?

I do not believe that we have any widespread implementations of 10646-based
browsers yet. Glenn, Amanda, Larry, others.... are there any test-bed
10646 browsers yet? Which platforms do they run on?

I suppose one could use the Plan 9 tools to create HTML documents; are
they sufficient? If not, what is out there, commercially or otherwise,
that can be used to create & edit HTML files using mixed writing scripts --
at the very least, able to do 8859-1 (so you can mark up tags & edit existing
documents) and two other `scripts', e.g. Hebrew, Greek, Farsi or whatever?
(I say `scripts' because I am not using the word in any of its technical
senses, deliberately).

I think my list (a)-(g) can be satisfied fairly soon, but is not satisfied
today, and that statements in HTML 2.0 should therefore not require 10646
encoded data, but instead should do no more than suggest ISO 10646 as a
strategic direction. In the meantime, implementors of browsers and other
WWW software need to get sample working code to demonstrate.

Lee