Re: Comments on HTML 2.0 DTD

Daniel W. Connolly (connolly@hal.com)
Thu, 8 Sep 94 19:21:59 EDT

In message <9409051010.AA28169@dragget.hpl.hp.com>, Dave Raggett writes:
>Dan,
>
>Did you get to see the comments I mailed and faxed to Murray Maloney
>for the Toronto meeting?

I don't think so...

>---------------------------------------------------------------------------
>
>I note that the A tag is defined as:
>
> <!ELEMENT A - - %A.content -(A)>
>
>The explicit exclusion of nested anchors means that the DTD could be
>simplified by dropping the distinction between %text and %htext, i.e.
>
>Level 0 <!ENTITY % text "#PCDATA|BR|A">
>Level 1 <!ENTITY % text "#PCDATA|IMG|BR|A|%phrase|%font">
>
>and globally replacing %htext by %text

Sounds reasonable.

>The prescriptive content model for the A tag is %text+ which is fine.
>However, the default content type for HTML 2.0 is currently defined as:
>
> (%heading|%block|%text)+
>
>There are enough instances of anchors enclosing headers that the loose DTD
>should support that, but I have yet to find examples of anchors enclosing
>lists or blockquotes etc. HTML 2.0 has to draw the line somewhere in
>formalising current practise, so can we agree on:
>
> <!ENTITY % A.content "(%heading|(%text)+)">
>
>This restricts the content to a single heading or the accepted %text string.
>

Hmmm... let me try this against my test suite...
OK. I'll buy it.

>Currently the HTML 3 parser can robustly determine document structure without
>having to look beyond the next token.

So you're still doing this extra-sgml parsing, huh? Frown. Sorrow.
The HTML 2.0 specification has the big "An SGML Application conforming
to ISO 8879" banner across the title page. Are we willing to drop
that for HTML 3.0? I hope not.

>Some defaults which should be made explicit:
>
> IMG -> ALIGN (top|middle|bottom) bottom
>
> INPUT -> TYPE %InputType TEXT

Makes sense... I didn't do this because this means that the parser
should report these attributes to the application, and the libWWW
implementation doesn't do this for its clients. Oh well... we shouln't
let that influence us too much.

>BTW are browsers expected to treat the input type values as case sensitive
>or as case insensitive?

Sorrow... Frown. Mosiac treats them as case sensitive. The current DTD
specifies that they are SGML NAMEs, and hence case-insensitive. Can we
call this a Mosaic bug, or should we cripple the DTD? There's
precedent in that we're calling Mosaic's attribute parsing and comment
parsing erroneous.

>BLOCKQUOTE is defined as being nestable. Is this intended?
>
>A quick experiment reveals that X Mosaic doesn't further indent nested quotes
>so we need to be explicit about what we want here.

I liked to think that you can copy-and-past any part of an HTML document
into a BLOCKQUOTE of another HTML document, but you raise a good point.

>The URN attribute (universal resource _name_ not number!) for links can be
>used for cache hit testing when the cache contains a document with the same
>URN but a different URL. We should therefore include the URN attribute with
>the IMG tag, as there are great opportunities for sharing common graphics,
>e.g. to build up a de facto library of graphics shared across the whole web.
>This would particularly help people struggling with 14k4 dial up lines.

Good idea.

>Can we agree to allow ISINDEX in the document body as this is by now
>quite common practice. I would have liked to also include an HREF attribute
>to redirect queries to other URLs but this isn't recognised by Mosaic yet!

I believe ISINDEX is allowed in the body in the Level 2 DTD. i.e. if
you do forms, you must allow ISINDEX anywhere. We discussed this in
Toronto, and now I don't recall the outcome. (I must have been
in the minority... :-)

Ah! I'm late for the game! Gotta go...

Dan