Re: Objections to draft-ietf-html-spec-01.txt

Terry Allen (terry@ora.com)
Tue, 21 Mar 95 19:18:08 EST

| From: Albert-Lunde@nwu.edu (Albert Lunde)
| Subject: Re: Objections to draft-ietf-html-spec-01.txt
|
| At 12:21 PM 3/21/95, Larry Masinter wrote:
| >As the author of those words, I'll admit that they're unnecessary and
| >more prone to introduce confusion than insight. I think all that's
| >necessary is to strike the "rather than relying on any SGML mechanism
| >for doing so."
| >"It is evisioned that HTML will use the charset parameter to allow
| >support for non-Latin characters such as Greek, Arabic, Hebrew and
| >Japanese."
| >However, Terry's further explanation that "HTML cannot conform to the
| >SGML standard, ISO 8879, if the charset encodings are specified by
| >some other means" is another point that I _thought_ we'd gone over,
| >but I'll have to review the archives to find the points again.
| At 1:12 PM 3/21/95, James D Mason wrote:
| >I also second Terry's position. If we're trying to make HTML a better
| >application of SGML and so ease the lives of those of us who use HTML as an
| >output form into which to render documents done in other SGML applications, we
| >should use only the mechanism specified in the standard.
|
| I think what we are running into is a conflict between our attempts to
| conform with MIME and our attempts to conform with SGML.

There need be no conflict. MIME must be in the service of SGML; we
must say how to construct an SGML decl from MIME info, even though
browser developers may short-circuit the mechanism we suggest.
However,

| I think Larry introduced the language in question in an attempt to make the
| HTML 2.0 spec work for other single character sets than ISO Latin-1, using
| a MIME-like character set parameter. (This was in part a way to
| postpone/avoid getting into general multilingual issues by indicating a
| mechanism for the simple MIME-like cases.)

Regardless of motivation, we solved the charset encoding problem for 2.0
by arbitrarily limiting ourselves to 8859-1 and providing an HTML SGML
declaration. Larry's language, amended, might be suitable for 2.1, in
which we should aim to achieve the goals cited by both of you, but it
is unneeded for 2.0, conflicts with our other remarks about charsets
therein (and our SGML standards language, which was my original point),
and prejudges the outcome of further discussion. It may prejudge well,
but it's inappropriate in the 2.0 spec.

| I still think this is a direction to go. I don't think putting the
| character set declaration in the body of a document makes sense in the
| context of current versions of HTML and HTML (and the 2.0 spec needs to
| stay close to current practice, which doesn't put much of the SGML "stuff"
| in the document.)

You need only 1) point to an SGML decl, a matter SGML Open is working on
(though no method of doing it is prescribed in ISO 8879!), or 2) rely on
standardized instructions for constructing and SGML decl given the
charset parameter. But you don't have a parseable entity until you can
say what the SGML decl is.

| On the other hand, it suggests, that to satisfy SGML mavens we at least
| need to specify a mechanism/algolrithm to derive an SGML declaration for
| other character sets than ISO-Latin-1.

Absolutely, and that mechanism will handle ISO Latin 1 too.

| ERCS may be a way to do this. (Define character classes for Unicode and
| project downward to subsets.)

Yes. It may not be the only way to do this.
..

| In any case this question raises some of the characterset/multilingual
| issues again: we don't have to solve them all, but it seems good to look at
| the implications.
| We could also abandon an attempt to define what the charset parameter
| really means in the HTML 2.0 spec and indicate that clients should not

Character set encoding is something we cannot fail to specify, and soon. But
that work is simply not part of the 2.0 spec, where we have created a legal
fiction in order to get the DTD part of the spec done and out the door.

| choke on it (thought this is really an HTTP issue). But this would make it
| rather urgent to deal with for 2.1, at least in the simple MIME-like case.

Don't know what that last means. We should at least describe how MIME
charset parameters should be given for existing implementations; it would
appear that we can also do that and more via ERCS.

Would the ERCS advocates outline the effects on HTML of using one of the
ERCS SGML decls? For example, if I use an ERCS SGML decl that specifies Chinese
characters in NAMING, would that enable me to create tags like
<!@#$%^> where !@#$%^ is in some Chinese charset encoding?

-- 
Terry Allen  (terry@ora.com)   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
occasional column at:  http://gnn.com/meta/imedia/webworks/allen/

A Davenport Group sponsor. For information on the Davenport Group see ftp://ftp.ora.com/pub/davenport/README.html or http://www.ora.com/davenport/README.html