You mean besides: parse as per SGML?
> At what point should we assume the closure of the <HEAD>? We can't
> do it at the first thing that the browser thinks shouldn't be there, because
> in that case, a browser that didn't understand <STYLE>data</STYLE> would
> assume closure of the HEAD as soon as it saw "data". I suppose we can assume
> it at the <BODY>, but we still leave the opportunity for a doc with <HEAD>,
> not </HEAD> and no <BODY> which would just disappear into limbo inside the
> browser. It sure worries me.
Blech. OK. You're talking about how the "if you don't recognize the
tag, throw it out" convetion interacts with "if you're in HEAD, don't
show data". Creative kludging.
Here goes nothing:
HTML -> <html>?, HEAD, BODY, </html>?
HEAD -> <head>?, head-content, </head>?
head-content -> TITLE | META | LINK | ...
| UNKNOWN-HEAD
TITLE -> <title>, data chars, </title>
META -> <meta>
UNKNOWN-HEAD -> <xxx>, unknown-head-content, </yyy>
unknown-head-content -> data chars
| head-content
BODY -> <body>?, body-content, </body>?
body-content -> data chars | H1 | H2 | UL | OL | P | ...
| UNKNOWN-BODY
unknown-body -> <xxx> | </yyy>
I believe that's an LR(1) grammar, i.e. it's implementable. It
requires that you keep track of how many unknown tags you're inside
when you're in the head, but not their names.
Hmmm... it also implies we can't add any empty tags to the
head. Otherwise it becomes ambiguous whether data chars are in the
head or the body. Bad news for future innovations.
This is a pretty good argument for saying "all data characters except
TITLE" are in the body. Corrollary: all style info has to go in
attributes or linked documents.
Blech.
Oh well.
Dan