Re: rethinking the HTML DTD.

Dan Connolly <connolly@pixel.convex.com>
Message-id: <9207142225.AA07409@pixel.convex.com>
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: www-talk@nxoc01.cern.ch
Subject: Re: rethinking the HTML DTD. 
In-reply-to: Your message of "Wed, 15 Jul 92 00:03:56 +0700."
             <9207142203.AA02008@ nxoc01.cern.ch > 
Date: Tue, 14 Jul 92 17:25:56 CDT
From: Dan Connolly <connolly@pixel.convex.com>

Ok, so we really do want to use SGML. Good. I agree. I just
wanted to hear from the WWW community.

>
>You say HTML is not SGML.  It is true that the HTML generted by the NeXT editor
>is not good. (example, lack of quotes around attributes which need them.)
>Hwoever, the current parser wil parse real SGML. 
>
The biggest problem with HTML files is that they have only 1 of the 3
basic parts of an SGML document: the SGML declaration, the prologue,
and the instnace. HTML documents only have the instance. It's legal
to omit the SGML declaration -- there's a default. But you've got
to have a prologue, or you end up with a non-standard way of infering
the prologue (for example, every WWW client infers the DTD described
in "http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html".)

So if we're commited to SGML, let's start putting something like

<!DOCTYPE HTML SYSTEM "http://info.cern.ch/hypertext/WWW/MarkUp/html.dtd">

at the front of every HTML file (we don't have to store it in the
file -- servers that distribute HTML could generate it on the fly.)
And let's put _some_ kind of DTD there.

>In the future, the web will inclued more complex DTDs, and dynamically
>loaded DTDs, and people will want to use the same parser for it.
>
Interesting! There are plans to support more than one DTD!
This makes SGML a clear winner.

>So I feel RTF would be a backward step. It is true that the current
>W3 software is at a point level with RTF rather than general SGML.
>But why tie ourselves to that point?
>

I guess that's what I wanted to hear: that the goals of WWW and the
features of SGML really _do_ have a lot in common, but the current
implementation doesn't support many of them.

Just to make sure I've beat this horse to death: let's begin to
formalize HTML and validate existing HTML documents before the
distance between HTML and SGML gets too big.

Dan

p.s. I'm working on a DTD that reflects the structure of most existing
word-processor documents: a sequence of paragraphs (maybe broken
into flows, sections, or whatever). I'll have RTF and MIF translators
for the DTD when it's ready. Maybe HTML2 can use some of the features --
the low level character-set related features, anyway.