Re: Hot Metal and HTML

pflynn@oclc.org (Peter Flynn)
Date: Wed, 15 Jun 94 08:07:35 EDT
Message-id: <9406151125.AA05148@curia.ucc.ie>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: pflynn@oclc.org (Peter Flynn)
To: Multiple recipients of list <html-ig@oclc.org>
Subject: Re: Hot Metal and HTML
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group
> > > I hear that Hot Metal won't read HTML.
> 
> > I hadn't heard that.
> 
> Dan hadn't either. ;-)  But he doesn't write HTML either. ;-))

Miaou!

> http://info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd
> (note capital HTML -- Dan changed the filename for some reason too)
> is now the original while we sort this out.  
> 
> Sorry, my mistake for folding in Dan's proposals without testing them all.

My main concern is that while the primary impetus for change is to
improve WWW, we ought to have regard to the Real World Outside[tm]
and to adhere to conventions where possible to make conversion, 
transition etc easier...this implies using <P> and <LI> as containers.
Just my CHF 0.02...

> You can't "put it right" retrospecively. 

Why not? I don't see what's wrong with saying "Sorry, we did a Q&D job
first time round to get the principle working, now we're going to do
the full job".

> If you do,
> noone will ever believe you again.  Specs are basically write-only
> within very margins.  Dan rightly maintains that the original
> HTML spec was not properly supported with a DTD. He made one for it though.
> And it has been around for a long time.  It is not well enough defined
> perhaps, but it certainly is well enough defined that everyone knows
> that "<h1>foo</h1>bar" is valid.

No. Everyone knows that the fragment you quote will display in a browser.
That's not the same as being valid HTML.

> I'm sorry Peter, but your use of "flawed" here is suspect.  You mean that
> you didn't like it. However, it did describe what people were doing,
> and doing quite successfully.  I see it would have been better to
> have <p> as container.  But there it is. 

Yes, "flawed" is perhaps OTT, sorry. Sure I didn't like it: it's lousy SGML
and just makes life difficult (a) now for people wanting to keep their HTML
docs in a manner consistent with other SGML applications and (b) in the
future for people who want to use more rigorous SGML software. But as you
said, there's no reason why we shouldn't use containers in HTML{+|2|3|4}:
I hope we do...my worry is the eventual conversion of the legacy, which is
growing hourly and which is probably ninety-something % non-conformant.

> I have just checked the dtd which went out when I put in Dan's
> a few days ago, and is now in, and it reads:
> <!ELEMENT P	- O EMPTY -- separates paragraphs -->
> <!ELEMENT HR    - O EMPTY -- horizontal rule -->

You're talking here about http://info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd.html
not the ...HTML.dtd I looked at yesterday before I wrote my last msg. No fair
switching the files :-)

This seems to be the problem: multiple copies each claiming to be the DTD.

> There are thousands of people and scripts and programs on the net
> churning out  "<h1>foo</h1>bar".  They us <p> as a separator
> because that is how it was defined inteh HTML spec, first 
> descripively in english, and later with the DTD.

Right.

> The closing tags </P> etc generated by HotMetal are not valid
> HTML -- they will fail SGML validation.  They only work because
> HTML parsers are specified as having a defined behaviour to ignore
> undefined tags.  So an SGML program is taking advantages of
> behaviour which SGML does not condone.  That would be OK by me IF
> IT WORKED 100%.
> 
> "<h1>foo</h1>bar" is valid HTML and nothing can change it.

Valid HTML 1, yes.

> Now, there was a plan that the HTML DTD could be changed such that
> it would *still* be valid, but that it would be interpreted
> DTD wise as equivalent to  "<h1>foor</h1><p>bar</p>".
> When I switched the working spec to Dan's version, it was
> under the misunderstanding that Dan had verified a lot of real
> HTML from the fiedl against it and it parsed.

I'm not sure where this came from: I don't know of any sensible way
that you could make that work.

> This plan failed, as the SGML tag implication algorithm is not
> strong enough (-Dan).  That is, it can deduce closing
> tags but not opening tags. So the trick will work for <LI>
> and <DT> and <dd> because they all have opening tags, but
> it won't work for <p>.

Yep.

> This means that either
> 1. <p> is kept as a separator, maybe with <pp> as a para style container, or

Possibly, but preferably not, IMHO.

> 2. We mandate that HTML parsers have a higher level of tolerance
>    than SGML parsers, in particular they can infer opening tags; or

No: HTML is either SGML or it's not. 

> 2. Text is allowed outside paragraphs as well as inside, as
>    Dave Ragget has suggested for html+; or

Yuck.

> 3. The new spec is called HTML+ or HTML2 but not text/html.

Fine: this is what I thought we were developing: text/html2

I'm not arguing with anything you said: I just think we should get away
(and get the users away) from practices which I believe are not conducive
to the success of WWW and to the general move in the direction of SGML
becoming more widely-used. We keep the separator in HTML(1) but as that
is superseded, HTML(n) [n>1] uses a container.

///Peter