Re: More syntax details in HTML 2.0?

Joe English (joe@trystero.art.com)
Wed, 14 Jun 95 12:50:19 EDT

Daniel W. Connolly <connolly@beach.w3.org> wrote:
> I asked a few members of the web consortium staff, and I got very
> strong feedback from folks with generally solid technical backgrounds
> who were somewhat new to the HTML spec that the spec is incomplete: it
> defers too much to the SGML spec, especailly on lexical issues:
> [...]
> The bottom line is: you can't pick up the HTML spec and
> write an HTML parser without becoming an SGML priest.

I always thought that the whole point of expressing HTML
in terms of SGML is that you wouldn't *have* to write
an HTML parser.

There are hundreds of RFCs that contain BNF or EBNF
grammars, yet none that I've seen have a description
of how to build a parser from a context-free grammar.

Also, perhaps more relevantly, look at any RFC that defines
an SNMP MIB -- pages and pages of ASN.1, which is just as
incomprehensible to the uninitiated as a DTD. Yet there's
no ASN.1 RFC either. [ Or maybe there is -- if so, please
tell me where, I've been looking for one! ]

The HTML RFC is not the right place for a definition
of SGML syntax.

> But I think it has finally sunk in: the stuff about "tokenization"
> needs to be expanded to be as detailed as a lex specification.
>
> So, barring objections from this working group, I'm going to make
> another revision to address this issue.

Add one more objection. Any definition of SGML in this
RFC is sure to take much longer than a week to get right,
and will almost certainly be incomplete.

If this issue becomes a real show-stopper with the consortium
staff, the IETF, or the RFC editor, there is another option.
There's a highly knowledgable SGML expert who has been working
on an SGML RFC in his spare time; last I heard it was still far
from complete, but we could ask him real nicely to submit what
he's got so far as an Internet-Draft, then cite it as a
work-in-progress.

--Joe English

joe@art.com