SGML Declaration vs. System Declaration

Dan Connolly (connolly@w3.org)
Fri, 5 May 95 16:25:56 EDT

Glenn Adams writes:
>
> I think that the RFC should call out:
>
> 1. an assumed SGML declaration
> 2. a System declaration which must be minimally supported

But the current draft doesn't even require that an HTML user agent be
a conforming SGML system. It doesn't say they have to employ a
conforming SGML parser either.

I really didn't want to get into conformance of user agents in this
document at all, to be truthful.

When I wrote my first HTML specification, it was a specification of
the language, i.e. the set of sequences of characters which were to be
called conforming HTML documents. No link semantics; no browser
behaviour; no "rendering hints". Nada. Just a grammar that specifies a
set of strings.

I argued dilligently (back in '92, I think) that to get the thing
published, this was the scope we should attack first. A browser spec
was a different ball game. I lost. I've been in a position to say
"See! If you'd taken my advice, we'd be done by now" any number of
times. Here we are again.

So we've taken on the scope of specifying HTML user agents, which look
a lot like SGML systems, but in practice, aren't. I agree that
practice shouldn't completely dictate the complete standard, and that
in fact HTML user agents should be conforming SGML systems with an
SGML system declaration in their documentation.

But this is quite a leap from the scope of the 2.0 document.

Are we prepared to take this on?

For example, take marked sections. Is a document with marked sections
a conforming HTML document? The current spec doesn't say explicitly,
so in effect, it refers folks to SGML, which says yes. (We could put
in an application convention that prohibits marked sections without
conflicting with SGML: SGML says that an application must specify that
all its documents are conforming SGML documents, but not the
converse.)

OK. So... is an HTML user agent required to process marked sections?
If a conforming HTML user agent is required to reliably process all
conforming HTML documents, then the answer is yes again.

I spent some time designing an HTML validator that checks the
application conventions present in most existing HTML user agents (no
marked sections, no <>, no </>, no internal declaration subset...) and
I convinced myself it's possible, and that in fact the lex/yacc code
that implements it should be an appendix to the HTML spec.

But in the interest of time, I dropped the whole idea.

I have made commitments that the HTML 2.0 document will be done by May
31. I would sincerely like to keep them. I'm willing to renegociate
those commitments, if this working group wants to change the scope of
the HTML 2.0 spec from:

* a DTD that specifies a set of valid documents
* some rendering hints, with lots of "shoulds" and almost
no "musts", with no rhyme or reason behind them
* half of an HTML user agent spec, with a few musts,
a few crossings over into the HTTP spec, a few
parts about ISMAP and x-www-urlencoded missing,
a bunch of "shoulds" and mabe a "must" here and there

into a complete specification of the language and user agents
(hopefully in separate documents).

My requirements are met by the current scope, in that it specifies the
language. There are many projects for which this is all you need:
html-to-xxx and xxx-to-html translators, for example.

I don't like the fact that the other stuff is in there half way (and I
hate maintaining it). I'd prefer to take it out altogether. I think
finishing it is a sufficiently large job that it should be a separate
document. But either way, we're talking about a lot of time.

My current plan is to conduct a detailed review of the document at its
current scope over the next week or two, and send it up the IETF
standards river after that. I'm willing to change that plan, but I
want everyone to know what we're getting into before I do.

Dan