Re: SGML/MIME charsets, ERCS/Unicode [was: New DTD (final version?) ]

Albert Lunde (Albert-Lunde@nwu.edu)
Tue, 14 Feb 95 21:08:08 EST

(Apologies for the delayed response, I've been unable to send mail for a
few days)

I'd like to clarify that in suggesting SMGL markup to indicate language, I
was *not* proposing we try and switch character sets in mid-document
(because prior discussion had indicated that makes the SGML treatment a lot
more complicated), but rather trying to address some problems raised in the
prior discussion of Unicode, by a mechanism that would have some
application (though not as generally) in other character sets.

I see from what Dave Raggett says, this is coming in the HTML 3.0 proposal
(sorry I missed this.)

This suggests that, if we can (1) register a Unicode encoding for use with
HTTP's charset parameter (following somewhat in the footsteps of
RFC1641/RFC1642) and (2) find a way to define/infer the appropriate SGML
stuff for HTML in Unicode, we would have forthcoming have a way (if not the
most general possible way) to send multi-lingual documents in WWW.

Regarding the charter and mission of the working group, I didn't intend to
overstep its bounds. I had the impression from prior discussion on the html
and http working groups that character sets and multilingual documents were
on the list of things to address after specifying current practice; but
reading back in the archives and minutes see that I may have mistook what
was intended.

The followups to prior discussion on the two lists didn't lead me to expect
a response as strongly worded as Roy Fielding's comments. On a second
reading, I noted that Roy said:
"Under no circumstances will the http-wg ever require that Web clients
and/or browsers use a specific character set other than ISO-8859-1."

The word "require" may be important here. Gavin can speak for himself; I'm
not so much trying to _require_ the use of Unicode as to ask questions to
explore what's needed to _allow_ the use of Unicode for multi-lingual
documents. I am looking for simple, "non-violent" changes to the spec.

I think someone suggested back in December, that Unicode be used as a
preferred transport code, (which would be pretty much outside Roy's
constraints), but I'm not sure this is implied by more recent proposals.
Even if we used Unicode as a formal tool to define an SGML declaration for
other character sets, this need not imply that browsers not supporting
Unicode would have to know anything much about the Unicode charater code.

So this line of reasoning may indicate Roy and I are in heated agreement,
and Dave Raggett's remarks suggest that the changes needed to the spec may
be smaller than it seemed.

I did not intend, in any case, to give offense.

I would appreciate an update re Gavin's proposals and ERCS in the light of
this discussion, if he thinks there is still something to say.

Luke 路客 <ylu@ccwf.cc.utexas.edu> raises some issues about the
limitations of Unicode for asian languages with respect to creation of new
characters.

These issues sound difficult to address here: it sounds like no widely-used
encodings solve this (not just Unicode), and I'm not sure a sufficently
open-ended encoding to address this could be fitted in the framework of
SGML (I don't know).

It seems like allowing the use of Unicode (or other existing character sets
beyond ISO-8859-X for that matter), is still a step in the right direction
in treating Asian languages.

---
    Albert Lunde                      Albert-Lunde@nwu.edu