Re: Hot Metal and HTML

"Daniel W. Connolly" <connolly@oclc.org>

Mail folder: html-archive
Next message: Stu Weibel: "for the record"
Previous message: Daniel W. Connolly: "Re: Adding new tags (was: Redefining...) "

Date: Mon, 13 Jun 94 15:36:37 EDT
Message-id: <9406131932.AA10812@ulua.hal.com>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: "Daniel W. Connolly" <connolly@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: Re: Hot Metal and HTML 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group

In message <9406131735.AA03342@ws02-00>, Stu Weibel writes:
>
>
>Dan,
>
>I hear that Hot Metal won't read HTML.

>From whom? What evidence did they give? I have succeded in getting
HotMeTaL to read HTML documents. So now you've heard that HoTMetal
_will_ read HTML.

That aside, let us be aware that HoTMetaL uses a DTD that looks a lot
more like Dave Raggett's HTML+ DTD than the DTD that I'm developing.
So don't confuse the rules that HoTMetaL supports with the rules that
are written in the HTML 2.0 spec at this point.

HoTMetaL won't read much of what's on the net, but I don't find that
surprising at all. Most of what's on the net has never been validated
w.r.t ANY DTD.

The only problem I had was the NEXTID tag. When I took it out, my
documents worked fine. I am in the habit of putting P tags at the
beginning of paragraphs, and folks will probably have to start doing
that before this version of HoTMetaL will grok their docs. But if
HoTMetal used the current HTML 2.0 DTD, this would not be necessary.

>I want to reiterate what I feel are
>the boundary conditions on anything being called HTML.
>
>In HTML <P> has been defined and used as a separator rather than a container.
>There are many who would have prefered that it had always been a container,
>but there it is: it isn't.

Could you give some evidence? I have about 94 test cases of various
sizes and shapes that are consistent with the notion that P is a
container. I suspect (though I haven't verified it in a while) that
they are also consistent with the notion that P is a separator. You
see, the two ideas largely don't conflict. It's a question of how you
want to look at it.

See attached messages for more on P... (cuz the @#$@ html-ig archive
isn't online yet!)

>The essential thing which defines the boundary on changes to the HTML DTD
>is network interoperatbility.  That is a sine qua non.  That is, if a
>document is valid HTML according to spec n, and it sent across the
>net as text/html, and it is parsed by a parser of any version m of HTML
>where m>n, it must parse OK.

Well said. But in practice, there is no HTML spec n, so we're really
at ground zero. There's the IETF draft spec version of the DTD, but
nobody uses it, so I can't see that it really counts for much.

>Now I am happy to see, in order to help us make changes we want to make,
>that the specification of HTML can go futher than SGM DTD. For example,
>text/html already specifies that the document should match the DTD *and* also
>it should not contain any new declarations of entities etc etc.

Yes... now up to here we have maintained the invariant that an HTML
document shall be a conforming SGML document.

>Similarly we can insis onthe nehaviour of browsers meeting tags they
>don't recognise.  We need to do thing like this, because  SGML has been
>designed as a top-down once-only design and not for stepwise refinement.

This is arguable... the 2.0 spec will "suggest" that browsers tolerate
certain errors for the sake of short-term interoperability with
experimental systems. It will not "insist" on anything beyond what an
SGML parser can do.

>In fact, it is easy to define when a program (sgmls aside) should
>infer a <p>.  So I'd like to know whether we are limited by SGML or
>sgmls.

First -- there's a red herring here: as far as I know, sgmls agrees
with SGML on the rules for tag inference.

Second -- on the substantive issue that SGML tag inference is not as
powerful as what most folks are used to (LR(1) parsers), I suggest
that we choose to trade conformance with this bit of stupidity for
all the value that an "SGML seal of approval" can bring.

If we say that HTML documents conform to a certain DTD "except that
you have to infer a few more tags..." then we might as well quit
talking about SGML and hack up a lex and yacc description of HTML.

>Dan has tried to show how we can move to containers for P, DT, and
>still call it HTML.  Someone somewhere is wrong, as HoT MetaL
>WON'T READ HTML and WON'T WRITE HTML.

Whoa! Your conclusion here is based on some highly faulty logic.

First: HoTMetal's DTD uses Dave Raggett's definition of P, DT, etc.,
not mine.

Second: even so, HoTMetaL _will_ read and write HTML.

>Either Hot metal is not implying tags as it ought, or Dan's scheme doesn't
>work in theory.  Which is it?

I'd say Hot Metal needs a new DTD. That's all.

>* If the scheme doesn't work in theory, we scrap the containers for HTML,
>and call everything with P as containers HTML+ which is incompatible,
>and define text/htmlplus.  That's fine... a single jump to get things clean.
>
>* If the scheme does work in theory, then HotMetal can be fixed to imply tags
>on input.  In fact it would be better for interworking for it to
>mimimise them away on output too: otherwise it relies on the rule of
>browsers ignoring unexpected tags.

The only place that it doesn't work in theory is if you want to
say that:

	<H1>hdr</h1>		(1)
	para
	<H1>hdr</h1>

is the same as

	<H1>hdr</h1>		(2)
	<p>para</p>
	<H1>hdr</h1>

I haven't figures out a way to do this inside the realm of SGML
conformance. You can, however say that

	<h1>hdr</h1>		(3)
	<p>para
	<h1>hdr</h1>

is the same as (2).

My strategy is that in HTML 2.0, form (1) is legal but discouraged,
since in the future, perhaps only forms (2) and (3) will be legal.
Perhaps there are other strategies.

>Which is it?  I am not having HTML go the way of RTF and postscript,
>that one never knows what will interwork with what.  If you screw
>it up once with HTML then noone will ever trust it again.

Amen.

More About P...

Forwarded: Fri, 10 Jun 1994 11:12:34 -0500
Forwarded: "html-ig@oclc.org "
To: alanb@ncsa.uiuc.edu (Alan Braverman)
Subject: Re: That old <p> tag again 
In-reply-to: Your message of "Mon, 30 May 1994 09:40:57 CDT."
             <9405301440.AA05033@void.ncsa.uiuc.edu> 
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <25175.771114850.1@ulua>
Date: Wed, 08 Jun 1994 17:34:11 -0500
From: "Daniel W. Connolly" <connolly@ulua>

In message <9405301440.AA05033@void.ncsa.uiuc.edu>, Alan Braverman writes:
>  We are not too thrilled with the redefinition of the <p> tag as a
>container, and I saved a few of your arguments from www-talk for support.
>Problem is, you seem to have switched sides in the debate.  Mind if I ask
>why?

P as a separator was a hack from day 1. I'm sure that timbl was
looking at documents used in systems that supported markup
minimization when he got the idea for the P tag.

In the code he wrote, it worked like the \par control in RTF.
You could say that is a valid design precedent, and P should
remain a separator.

But there is value in having a paragraph container element.
It's provides a much more straightforward
way to say "this paragraph should be blue"
in stylesheets, or to say "find FOO and BAR in the same
paragraph".

I argued that the name P was already out there, and
that it's not nice to change the meaining of names
that are already out there.
I suggested that if we're going to change the semantics, we
should change the name -- migrate to a PP tag, or some such.

But nobody liked that idea. So here we are...

Dan

Forwarded: Fri, 10 Jun 1994 11:15:53 -0500
Forwarded: "html-ig@oclc.org "
Return-Path: alanb@ncsa.uiuc.edu
Received: from hal.com (hal-backbone) by halaus.hal.com (4.1/SMI-4.1.2)
	id AA04957; Thu, 9 Jun 94 13:21:44 CDT
Received: from newton.ncsa.uiuc.edu by hal.com (4.1/SMI-4.1.1)
	id AA28913; Thu, 9 Jun 94 11:21:43 PDT
Received: from void.ncsa.uiuc.edu by newton.ncsa.uiuc.edu with SMTP id AA24398
  (5.65a/IDA-1.4.2 for connolly@hal.com); Thu, 9 Jun 94 13:21:38 -0500
Return-Path: <alanb@ncsa.uiuc.edu>
Received: by void.ncsa.uiuc.edu (4.1/NCSA-4.1)
	id AA26047; Thu, 9 Jun 94 13:20:26 CDT
Date: Thu, 9 Jun 94 13:20:26 CDT
From: alanb@ncsa.uiuc.edu (Alan Braverman)
Message-Id: <9406091820.AA26047@void.ncsa.uiuc.edu>
To: "Daniel W. Connolly" <connolly@hal.com>
Subject: Re: That old <p> tag again 
In-Reply-To: <9406091815.AA07229@ulua.hal.com>
References: <9406091808.AA25468@void.ncsa.uiuc.edu>
	<9406091815.AA07229@ulua.hal.com>

Daniel W. Connolly writes:
> In message <9406091808.AA25468@void.ncsa.uiuc.edu>, Alan Braverman writes:
> >
> >If NCSA went with the <PP> tag, would you support it, or have you given in
> >completely?
> 
> I still think its a good idea in prinicipal, but you're talking about
> MAJOR headaches for document providers, I'd expect. It's a question of
> how we make the transition.
> 
> I'm willing to raise the issue again. Do I have your permission to
> quote you to www-html etc., or would you like to send out the message?

Well, don't quote any of my haphazzard e-mail, but you have the support of
the NCSA developers.  No one here likes the idea of redefining the <P> tag,
we would rather go with a new tag, as you suggested to www-talk (either
<PP> or <PARA>).  You can quote that if you want to :-)

Ciao,
Alan

--
Alan Braverman
Software Development Group
National Center for Supercomputing Applications
alanb@ncsa.uiuc.edu