Hot Metal and HTML

Tim Berners-Lee <timbl@oclc.org>
Date: Wed, 15 Jun 94 05:20:54 EDT
Message-id: <9406150919.AA02162@www3.cern.ch>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: Tim Berners-Lee <timbl@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: Hot Metal and HTML
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group
Retry posting to the group.

re Peter's message.
> 

> Tim writes:
> 

> > I hear that Hot Metal won't read HTML.
> 

> I hadn't heard that.

Dan hadn't either. ;-)  But he doesn't write HTML either. ;-))
Dan adds what HTML regards as superfluous <p>s before text.
Always. So HotMetal can read it. But he hasn't used Hot Metal:
you can see because there are no HTML-illegal </DT>s etc
which HotMetal writes.

> > In HTML <P> has been defined and used as a separator rather than a  
container.
> > There are many who would have prefered that it had always been a  
container,
> > but there it is: it isn't.
> 

> But I think it is: both the current and previous copies of the DTD at
> http://info.cern.ch/hypertext/WWW/MarkUp/html.dtd, which most users will 

> treat as the canonical reference, say <!ELEMENT P - o (some content)>
> which is not the same thing as defining it EMPTY.
http://info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd
(note capital HTML -- Dan changed the filename for some reason too)
is now the original while we sort this out.  

Sorry, my mistake for folding in Dan's proposals without testing them all.


> It has been acknowledged that defining it as a separator was a simple
> mistake, and the best thing is to put that right now, not perpetuate it.


You can't "put it right" retrospecively. If you do,
noone will ever believe you again.  Specs are basically write-only
within very margins.  Dan rightly maintains that the original
HTML spec was not properly supported with a DTD. He made one for it though.
And it has been around for a long time.  It is not well enough defined
perhaps, but it certainly is well enough defined that everyone knows
that "<h1>foo</h1>bar" is valid.

> > The essential thing which defines the boundary on changes to the HTML DTD
> > is network interoperatbility.  That is a sine qua non.  [...]
> 

> Almost. But are we talking browser-level interop (which we already have) or
> parser-level (which we don't, quite yet)? We are in a state of flux, and I
> think you are completely correct to say we need interop, but the problem in
> the past has been that the public version of the DTD either didn't work  
(now
> long since fixed) or was otherwise flawed (P, LI empty, etc). 


I'm sorry Peter, but your use of "flawed" here is suspect.  You mean that
you didn't like it. However, it did describe what people were doing,
and doing quite successfully.  I see it would have been better to
have <p> as container.  But there it is. 


> (I'm as much to
> blame as anyone: I've just bodged my HTML+ because I think the idea of  
making
> LI contain only P instead of %text; is overly restrictive, and I'm playing 

> around with alternatives.) Interop: freeze HTML1/2 asap.
> 

> > Dan has tried to show how we can move to containers for P, DT, and
> > still call it HTML.  Someone somewhere is wrong, as HoT MetaL
> > WON'T READ HTML and WON'T WRITE HTML.
> 

> What does Yuri say?
> 

> > * If the scheme doesn't work in theory, we scrap the containers for HTML,
> > and call everything with P as containers HTML+ which is incompatible,
> > and define text/htmlplus.  That's fine... a single jump to get things  
clean.
> 

> But the CERN dtd defined P as - O last year...

I have just checked the dtd which went out when I put in Dan's
a few days ago, and is now in, and it reads:
<!ELEMENT P	- O EMPTY -- separates paragraphs -->
<!ELEMENT HR    - O EMPTY -- horizontal rule -->

You may be thinking of htmlplus DTDs -- but they were for discussion
and their relationship with HTML was up for definition. And it all hangs
on this issue.


Let me summarise the situation as I see it.

There are thousands of people and scripts and programs on the net
churning out  "<h1>foo</h1>bar".  They us <p> as a separator
because that is how it was defined inteh HTML spec, first 

descripively in english, and later with the DTD.

The closing tags </P> etc generated by HotMetal are not valid
HTML -- they will fail SGML validation.  They only work because
HTML parsers are specified as having a defined behaviour to ignore
undefined tags.  So an SGML program is taking advantages of
behaviour which SGML does not condone.  That would be OK by me IF
IT WORKED 100%.

"<h1>foo</h1>bar" is valid HTML and nothing can change it.

Now, there was a plan that the HTML DTD could be changed such that
it would *still* be valid, but that it would be interpreted
DTD wise as equivalent to  "<h1>foor</h1><p>bar</p>".
When I switched the working spec to Dan's version, it was
under the misunderstanding that Dan had verified a lot of real
HTML from the fiedl against it and it parsed.

This plan failed, as the SGML tag implication algorithm is not
strong enough (-Dan).  That is, it can deduce closing
tags but not opening tags. So the trick will work for <LI>
and <DT> and <dd> because they all have opening tags, but
it won't work for <p>.

This means that either
1. <p> is kept as a separator, maybe with <pp> as a para style container, or
2. We mandate that HTML parsers have a higher level of tolerance
   than SGML parsers, in particular they can infer opening tags; or
2. Text is allowed outside paragraphs as well as inside, as
   Dave Ragget has suggested for html+; or
3. The new spec is called HTML+ or HTML2 but not text/html.

These are as I see it the four options open to us as we plot the course of
WWW history. 


   

> ///Peter

Tim