Re: HTML DTD

timbl (Tim Berners-Lee)
Date: Fri, 26 Jun 92 15:35:05 MET DST
From: timbl (Tim Berners-Lee)
Message-id: <9206261335.AA04694@ nxoc01.cern.ch >
To: connolly@pixel.convex.com, timbl@nxoc01.cern.ch
Subject: Re: HTML DTD
Cc: www-talk@nxoc01.cern.ch
Dan, you say

<<
I suppose you could come up with a DTD that describes something
close to the current HTML, but I'm not sure of the value of it.
HTML allows tags to be pretty much sprinkled wherever you feel
like putting them. Any DTD that allows that much leeway just
looks like this:

        <!ENTITY % alltags "TITLE|H1|H2|H3|MENU|OL|UL">
        <!ELEMENT %alltags (%alltags)*>

i.e. every element is just a repeatable or-group of all the elements.
Then the SGML parser can't do any minimization cuz nothing's required. >>

Yes, current SGML currently is just a linear sequence of
elements. (Sorry, current HTML -- I'm typing this in serially
and can't edit!).  There is a reason for this:  it is very
convenient for HTML to map onto a series of styles -- for two
reasons.

Firstly, a lot of rich text objects can hold styles but can't hold
structure.  You can deduce structure from the styles -- like
Word deucing outlining from Heading styles, and WWW deducing
a list <UL> from a lot of <LI> paragraphs. But you can't go
very far.  If you want to make a HT editor out of such a
text object, you ahve to regenerate the elements from the
styles.

Secondly, it may be that the wysiwyg editors have a linear style
structure because that is intuitive to people. I don't know
a lot of people who use author/editor (which maintains
structure). Maybe real people actually think in terms of styles
and fix the document to look right, then they are happy to have the
structure deduced.

So if we went for a nestable HTML which would be cleaner for
those who apreciate recursion, we would have to have a hypertext
editor which made the structure visible.  I don't have experience
enough to know whether real information providers (group secretaries,
for example) would be into generating nested elements -- maybe
the styles are useful to keep as the current `user interface metaphor'
of word processors.

(It also makes making the editor easier!)

Or maybe we should have two levels of DTD -- one basically linear
and mandatory (and precompiled for fast access) and one more
sophisticated for larger documents.

Of course, when you are writing hypertext the large documents are
normally broken down into small bits to make traveing them quick.
So whereas each hypertext node may contain only H1 and H2 headings,
when a book is generated a la the_www_book.ps you get 5 levels
of heading from the whole tree.

So that is why the HTML strcuture is so simple. I am open to
a more sophisticated alternative.

Tim
____________________________