How prescriptive can/should we be? [Was: DL content model ]

Daniel W. Connolly (connolly@hal.com)
Wed, 16 Nov 94 16:21:29 EST

First, let me dispense with this part:

In message <Pine.BSI.3.91.941116110337.22604H-100000@get.wired.com>, Brian Behl
endorf writes:
>So I initially thought that (DT+|DD+)+ would be proper - but then someone
>brought up the case of dynamically generated lists and that empty <DL>'s
>would be good in that case.

I strongly disagree. If the list is being generated by software, then
that software can arrange to write out _nothing_ rather than <dl></dl>.

Now... on to the meaty part of the discussion:

>On Wed, 16 Nov 1994, Peter Flynn wrote:
>> Bert wrote:
>> > I haven't seen (DT+|DD+)* yet, wouldn't that express everything people
>> > asked for: one or more terms with one or more definitions?
>>
>> One would have thought so, but the reality is that DL has been used for
>> other purposes, so if we are trying to make the DTD reflect practice as
>> well as intent, we need to handle cases where there are zero or more DTs
>> and DDs.
>>
>> I know of many files where DD has been used on its own to make some text
>> indent.
>
>..which is contrary to the intent of DL course. Content providers can't
>exactly be faulted for misusing semantic tags for their presentational
>effects since they don't otherwise have control over presentation -
>my own site violates just about every rule in the book (I've had a lot of
>fun with the validation service, and Arena's "Bad HTML" warnings :)
>However, the fact that people are misusing HTML now shouldn't be an
>excuse to officially "allow" that.

Ok... we're trying to specify current practice, but we have on many
occasions decided that some practices are bogus and won't be part of
the standard.

As I think carefully about this, I see three conjectures to consider:

1. "If a document is HTML 2.0 compliant, then it will display properly
on contemporary browsers."

This is largely true (except for browser bugs like comment parsing,
and obscure SGML features like marked sections), and very much a goal
of this specification.

2. "If a document displays properly on contemporary browsers, then it
is HTML 2.0 compliant."

This is obviously false, and not really a goal of the specification.
User agents will probably always support non-standard crap. But each
will do so in its own way. In fact, I challenge anyone to come up with
a tractible defintion of "displays properly on contemporary browsers."

On the other hand, we don't want to give the impression that
information providers will have to completely overhaul their stuff to
become 2.0 compliant. This brings us to:

3. "If a document displays correctly on contemporary browsers, then
there exists some HTML 2.0 compliant document that displays the same
information."

I think this is an implicit goal of the 2.0 specification effort. It
says that if you've put info on the web, it is possible to put a 2.0
compliant representation of that info on the web, perhaps with a little
tweaking, and perhaps with a little loss of presentational control.

The phrase "displays the same information" is fluffy-speak, not really
subject to formal interpretation.

The gist of this third conjecture is that "HTML 2.0 doesn't take away
any features. You may have to spell those features a little
differently, but your consumers will not see much difference."

The one outstanding issue w.r.t this 3rd conjecture is <IMG> inside
<PRE>. I'm surprised there wasn't more fuss about it.

In my testing, I ran across many abuses of <PRE>. A common idiom was:

<PRE>
<img src="figure1.gif">
<b>caption for figure 1</b>
</PRE>

Since this can be re-coded as:

<p><img src="figure1.gif"><br>
<b>caption for figure 1</b>

I'm not losing any sleep over it.

But <PRE> is also used to build forms and do all sorts of other crazy
things where folks may be using <IMG> inside <PRE> for an effect that
_cannot_ be recoded to conform to the current DTD.

Given the relatively small amount of flack that I have received
regarding this issue, I'm tempted to try to "slip this one by" and
thereby discourage folks from abusing <PRE> in the future.

In fact, in an earlier draft of the HTML DTD, <b>, <i>, and <tt> were
allowed _only_ inside PRE, and <em>, <strong> etc. were allowed _only_
outside PRE.

The idea behind PRE was that it's preformatted: the information
provider is specifying the exact layout of the characters. Inside PRE,
you might as well give the exact font too. Outside PRE, this is a
no-no.

But <b> and <i> are the rule, not the exception, so that idea got
scrapped.

So two questions:
1. Do folks agree with my assessment of the above 3 conjectures,
and
2. Should I change the DTD to allow IMG inside PRE?

Dan