Eric Bina (
Thu, 6 Jul 95 13:35:35 EDT

Joe English <> says:
> It would suffice to ignore the content of unrecognized
> elements (i.e., those not in HTML 2.0) until it can be
> determined that the <BODY> element has started.
> Untagged character data should imply the beginning of the <BODY>.
> For tagged data, either the element is a known HEAD element
> (in which case the content should be ignored, i.e., <TITLE>),
> it is a known BODY element (in which case </HEAD><BODY> should be
> inferred and the content displayed), or it is a new element
> (in which case the browser should assume that it belongs
> to the <HEAD> unless some prior tag or character data
> has implied </HEAD><BODY>.)
> It's safe to assume that the <HEAD> element will not
> include #PCDATA in its content model in any future
> revision, and that any character data in the <HEAD>
> will be enclosed in another element.

... a couple of good examples removed for brevity

> Example 3:
> <!doctype html PUBLIC "-//IETF//DTD HTML Experimental//">
> <html>
> <head>
> <title>blah</title>
> <style>blah...</style>
> <!-- <STYLE> unknown; don't infer </HEAD><BODY>; ignore content -->
> </head>
> <body>
> <newel>blah</newel>
> <!-- <NEWEL> unknown, but <BODY> has been seen; include content -->

If I understand this proposal, it fails for new non-content head tags.

In the above example you assume that even though the browser doesn't know
<STYLE>, it knows to match it to </STYLE> and ignore content. Suppose
you have:

<!doctype html PUBLIC "-//IETF//DTD HTML Experimental//">
<!-- <newtag> unknown; how do you know how far ahead to look for
</newtag> before giving up and inferring </HEAD><BODY>
to make "blah..." the start of the body of the document? -->