Re: META

Terry Allen (terry@ora.com)
Tue, 4 Jul 95 21:22:18 EDT

Alex Hopmann:
..
[Terry:]
| >| What this message means to me is if an unknown HEAD tag with content is
| >| encountered, the unknown tag will be ignored, and the content will force
| >| an assumed beginning of the body because </HEAD> is optional. Needless
| >| to say, this is bad.
| >| Eric
| >
| >That does not follow. It's only when you hit something you know should
| >be in BODY that you can be sure HEAD is finished (in a conforming doc).
| >"abc" is something (PCDATA) already accounted for in the DTD.
|
| Let me try to revisit the example that Eric Bina gave:
|
| 1. <!-- select doctype above... -->
| 2. <HEAD>
| 3. <TITLE><!-- your title here --></TITLE>
| 4.
| 5. <METADATA>abc</METADATA>
| 6. <!-- your HTML test data -->
| 7. </BODY>
| 8.

Much better example, although this is not a conforming HTML 2.0 document,
which is the realm I thought Eric, perhaps unintentionally, limited
himself. I hadn't want to complicate the case. However, this case
still follows the rule I gave above. METADATA (a tag not in the DTD)
is not something you know goes in BODY.

| Now lets pretend for a minute that I'm a parser that doesn't understand
| <METADATA>. So I have no idea if <METADATA> is a HEAD tag or a BODY tag. If
| <METADATA> is in the BODY I would ignore the METADATA tag (not understanding
| it) and just display abc normally. But if METADATA is in HEAD, I would just
| ignore the whole thing.

If you really want to know what goes in HEAD and BODY, you need to
parse per ISO 8879, but following the line of discussion we've been
following so far, you can use the rule I gave to determine that when
you hit METADATA you haven't yet hit anything you know should be in
BODY, hence you don't yet close HEAD.

Now this rule may cause problems for those who want to introduce a
new, unknown tag as the first element in BODY, without explicitly
closing HEAD. The contents of such a tag would be hidden, which is
not what is desired. Users who want to create such (nonconformant)
docs should do one of the following: close HEAD explicitly, or
interpose a BODY tag from the DTD. E.g.,

Sir ... or Madam

... | This is I think where SGML does not work always for the WWW. SGML assumes | that it always has a correct DTD, whereas we need to build an application | that can be libreal with what we receive (And hopefully strict with what we | send out...) Disclaimer: I am not saying SGML is bad or anything of the | sort. I just think we need to keep in mind that we need to only use those | constructs that will actually work without the exact DTD that was used to | create the document being available. I take your point, and I agree that error recovery is important to build into browsers, though it is not comtemplated in 8879. Error recovery is possible when you can guess what the author's intent probably was, and err on the conservative side (as by not displaying what may have been meant to be hidden). Note that as HTML gets more complex it may be more difficult to determine authorial intent ... and that there are minor errors and then again errors so gross as to be unrecoverable from. Nor am I challenging the (for SGML) offbeat rule that unknown tags are ignored while their contents are rendered; that's a useful form of error recovery. But I think this is a case where error recovery is possible given a simple rule that parser developers can build into their code whether they're doing SGML parsing or procedural interpretation. So once again, the rule is that you close HEAD when you hit something you know (the DTD says) may belong to BODY. A further case for error recovery heuristics would be something like Captain

some text meta content but I can offer no advice on so nonconformant a document, as I can't guess what the author intended. Regards, -- Terry Allen (terry@ora.com)