DL content model: (DT|DD)+?

Daniel W. Connolly (connolly@hal.com)
Mon, 14 Nov 94 16:11:19 EST

Dan> * change to DL content model from (DT*, DD?)+ to (DT | DD)*.
Dan> OK, I guess.

ehood> Both basically define the same set of combinations. The second is
ehood> better because it reflects common practice, and it is easier to
ehood> understand.

On second thought, what meaning do we ascribe to an empty DL? Why not
(DT|DD)+? I find that using + instead of * catches typos now and then.
I'll have to do some testing though...

Here's what the peanut gallery had to say...

lee> (DT*, DD?)+ is almost certainly an illegal content model, because it's
lee> ambiguous (in the SGML sense). The changed version doesn't express the
lee> intent, but does match the same set of patterns, in a legal way.

"Almost certainly" ... spoken with such confidence! And from a
SoftQuad guy! Certainly you have tools with which you can investigate
these issues with precision, no? In fact, the content model is not
abiguious in the SGML sense, as Joe points out:

Joe> (DT*, DD?)+ is actually unambiguous. Content models
Joe> can only be ambiguous if the same GI appears more
Joe> than once in the model, which is not the case here.

DaveR> This doesn't match what was discussed at the Working group meeting
DaveR> in Chicago. I thought we had agreed on:
DaveR>
DaveR> <!ELEMENT DL - - (DT*, DD)+>
DaveR>
DaveR> This reflects the usual semantics of definition lists with a number
DaveR> of synonymous terms for each definition, as well as accounting for
DaveR> current usage on the Web. Anyway, this is what I am proposing for
DaveR> HTML 3.0.

DanielG> I clearly prefer Dave's definition. Much more conformant
DanielG>to the "common" DTDs (other than HTML, of course...) style.

Peter> I thought it was going to be (DT, DD+)+
Peter> eg, DT,DD must occur in pairs at least, but DD can occur more than once for
Peter> each DT. Or is that too intolerant of people using it do do indenting for
Peter> them?
Wkr> Doesn't this allow the following?
Wkr>
Wkr> <DL>
Wkr> <DD>Definition without preceding term</DD>
Wkr> </DL>
Wkr>
Wkr> And if it does, wouldn't
Wkr>
Wkr> <!ELEMENT DL - - (DT+, DD)+>
Wkr>
Wkr> be more sensible? Or am I missing something?

What your missing is that there is a LOT of legacy data out there that
uses DL just to indent stuff. I think most data is consistent with
(DT*, DD)+ but there's a lot of stuff out there with no DTs at all.
We wouldn't want to give implementors the impression that they can count
on seeing a DT.

This is a clear case of "it should be ..., but the legacy data looks
like ..." Shall we be completely descriptive, and go with (DT|DD)*,
or attempt to prescribe some structure?

Peter> Ah...I knew there was a reason. In that case we must also allow this
Peter> for MENU and DIR, and make DL - - (DT,DD+)* (ie they can be absent,
Peter> but if they occur they must occur in pairs with DD being allowed to repeat).

I have found no legacy data with empty MENU and DIR elements. And I
don't want folks to _start_ generating that kind of meaningless
markup. Evidence to the contrary is welcome.

Dan