HTML/SGML/charsets

Terry Allen (terry@ora.com)
Fri, 31 Mar 95 10:44:50 EST

While I appreciate Roy's common sense and recent Titanic efforts,
I have to disagree about how the spec describes the status of HTML
wrt SGML re charsets. I rave on about this matter because it was
handled correctly in the previous draft, and Roy has changed the
language to say just the reverse of what it said before.

| > The old language was:
| > 2.5 Understanding HTML and SGML
| > HTML is an application of ISO Standard 8879:1986 -
| > Standard Generalized Markup Language (SGML). SGML is a
| > system for defining structured document types, and
| > markup languages to represent instances of those
| > document types. The SGML declaration for HTML is given
| > in Section 5.1. It is implicit among HTML user agents.
| >
| > If the HTML specification and SGML standard conflict,
| > the SGML standard is definitive.
| >
| > and that is the only approach I can support. HTML is defined as
| > an application of SGML; we cannot ignore the SGML standard when we
| > choose. It's that second para, saying that the SGML standard is
| > definitive, that is still needed. The wording about how some HTML
| > apps can't ignore HTML is kinda odd considering almost all of them do.
|
| On the contrary -- we can and do ignore the SGML standard with a great
| deal of regularity and for many good reasons. That is life! No user agent
| is required to be an SGML application, and those applications are quite
| capable of ignoring the SGML standard (sometimes in an unfortunate way).

The use of SGML to encode HTML docs is an SGML app, and must be
conformant or it is meaningless.

| > What's at issue here is how browsers are to do error recovery;
| > let's not say we're defining an SGML app and then saying the SGML
| > standard isn't normative for SGML apps.
| We are not defining *just* an SGML app. We are defining a media type
| that is both SGML-conformant and a reasonable proximity to what people
| were calling "text/html" back in June of last year. That is why we
| are in an IETF WG instead of an SGML Open group.

No, SGML Open is not a standards body. Apples and oranges. And
we specifically dealt with character set issues by deciding that
we wouldn't, for 2.0, and that we'd limit ourselves to 8859-1.
None of this equivocation is necessary.

| User agents can (and in some cases, should) bend the rules of SGML
| in order to provide maximally robust interface to the user. Quite frankly,
| this is an area that Internet people have had more experience with than
| SGML people, and I think SGML folks should learn from it just like we
| have learned the benefits of formally-structured documents.

If we do not define a conformant DTD, or if we set up a situation in
which SGML tools will give a different result from HTML UAs when
processing *valid* HTML (not talking about error recovery here),
we will have failed to produce a valid HTML spec and will deserve
all the calumny that will eventually come our way.

When you speak of "achieving an maximally robust interface to the user"
*that* sounds like error recovery to me.

| On the Internet, shit happens on a regular basis -- a standard which
| is not capable of coping with that (and in a consistant manner) is not
| worthy of becoming an Internet standard.

I suggest breaking out all the UA stuff into a separate document. It
seems only to be getting in the way of defining an SGML conformant
interpretation of an incoming HTML doc, which we still haven't done yet.

-- 
Terry Allen  (terry@ora.com)   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
occasional column at:  http://gnn.com/meta/imedia/webworks/allen/

A Davenport Group sponsor. For information on the Davenport Group see ftp://ftp.ora.com/pub/davenport/README.html or http://www.ora.com/davenport/README.html