Agree: text outside P [Was: Hot Metal and HTML ]

"Daniel W. Connolly" <connolly@oclc.org>
Date: Wed, 15 Jun 94 12:57:01 EDT
Message-id: <9406151654.AA00683@ulua.hal.com>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: "Daniel W. Connolly" <connolly@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: Agree: text outside P [Was: Hot Metal and HTML ]
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group
In message <9406150919.AA02162@www3.cern.ch>, Tim Berners-Lee writes:
>
>"<h1>foo</h1>bar" is valid HTML and nothing can change it.
>
>Now, there was a plan that the HTML DTD could be changed such that
>it would *still* be valid, but that it would be interpreted
>DTD wise as equivalent to  "<h1>foor</h1><p>bar</p>".
>When I switched the working spec to Dan's version, it was
>under the misunderstanding that Dan had verified a lot of real
>HTML from the fiedl against it and it parsed.

Whoa! Please get your facts straight, Tim. I _have_ verified a lot of
real HTML from the field, and it _does_ parse.

I _never_ said that "<h1>foo</h1>bar" was equivalent to
"<h1>foor</h1><p>bar</p>" to the SGML parser. It may be to the
application, but I've tried several techniques to teach the SGML
parser to do this, but each technique conflicted with current
practice, so I scrapped them.

>This plan failed, as the SGML tag implication algorithm is not
>strong enough (-Dan).  That is, it can deduce closing
>tags but not opening tags. So the trick will work for <LI>
>and <DT> and <dd> because they all have opening tags, but
>it won't work for <p>.
>
>This means that either
>1. <p> is kept as a separator, maybe with <pp> as a para style container, or
>2. We mandate that HTML parsers have a higher level of tolerance
>   than SGML parsers, in particular they can infer opening tags; or
>2. Text is allowed outside paragraphs as well as inside, as
>   Dave Ragget has suggested for html+; or
>3. The new spec is called HTML+ or HTML2 but not text/html.
>
>These are as I see it the four options open to us as we plot the course of
>WWW history. 

I have been using option 2' for some time now, not because I thought
it was The Right Thing To Do, but because I posted my views to www-talk,
and the majority at the time seemed to favor P as a container.

It's clear that a paragraph container is easier for html-reading
applications to deal with than a paragraph separator. I've written
several HTML->XXX converters, and using paragraph separators complicates
the code significantly.

Nick Williams also argues this point in his "Experiences With A
WYSIWYG Editor for HTML"

From: http://web.cs.city.ac.uk/homes/njw/htmltext/www94.html :

	Also annoying is the manner in which representing a tagged
	section of text is different for some environments. Most
	tagged text has very strict begin/end pairs of tags annotating
	the specific text, but list items use a completely different
	approach: that of marking the start of the section and not
	using an end tag. And some items (images and paragraph breaks)
	are not environments over text at all, but are singleton tags
	indicating "magic". None of these is difficult to overcome,
	however collectively they increase the complexity involved in
	making an engine to both read and write correct code, knowing
	the difference between all the different types.

In message <9406150919.AA02162@www3.cern.ch>, Tim Berners-Lee writes:
>
>"<h1>foo</h1>bar" is valid HTML and nothing can change it.

While this is certainly legal as of the 2.0 spec, I think it will be
more valuable in the long run to phase out this idiom. Supporting this
idiom complicates things unnecessarily.

In the interest of future WWW development, I'd like to phase out the
paragraph-separator idiom in favor of the paragraph-container idiom.

One way to do that is to say that the P element is a container, but
for now, text is allowed outside paragraphs. We include language in
the spec that warns that in future specifications, it will not be
allowed.  And we include a switch in the DTD that allows folks to
check to see that they haven't used this deprecated idiom.

There is one outstanding problem: If you write:

	<ul>
	<li><p>item 1
	<li><p>item 2
	</ul>

it won't "look right" on current browsers -- Mosaic, for example, sticks
a blank line between the bullets and the text. We could call that
a bug in Mosaic, but I'm not sure that's a good idea.

On the other hand, using the PP proposal, we could write:

	<ul>
	<li><pp>item 1
	<li><pp>item 2
	</ul>

and everything would work quite nicely.

Dan