Re: HTML 2.0 LAST CALL: Numeric character refs

Daniel W. Connolly (connolly@beach.w3.org)
Sat, 3 Jun 95 11:23:37 EDT

In message <9506030739.ZM2596@dmg.west.ora.com>, "Terry Allen" writes:
>There is great advantage in being able to use SGML conformant tools
>to process HTML.

Yea verily!

> However, our experience to date shows that error
>handling is invoked to make end runs around the agreed-upon DTD
>and sdecl (unknown elements, unknown atts, NCRs), all of which
>impose non-SGML requirements on HTML processing systems, and
>break tools that are ready to hand.

Sing it, brother. I've heard this tale of woe many times. I (we?) have
heard reports from engineering organizations that started writing an
HTML parser by using the SGML and HTML specs, and then did some
real-world testing and added all the error-handling cases. The error
handling work was reportedly twice as much work as the original
implementation.

>I suggest that all material on error handling should
>go not in the standards-track HTML 2.0 spec RFC but in an informational
>RFC devoted to the issue.

I'm afraid this would have the opposite effect from what you (and I)
seek: the current circumstances arise from the fact that there
effectively no HTML spec -- users go to their browsers for the last
word (and to some "HTML How-To" documents for the first word), and
implementors go to the Mosaic 2.4 source code, or they somehow reverse
engineer the behaviour of Netscape. Hence there is a lot of HTML out
there that can't be described in a way that's consistent with SGML at
all, let alone SGML as we'd like to use it.

If the HTML 2.0 specification did not include informative notes
telling implementors what to look out for, a few of them would
code to the spec and find it so out of touch with reality that
they would disrecommend it to their peers. A few authors would
read the spec and wonder why it doesn't match the intuition
they've built up using Mosaic etc.

The result would be that even fewer folks would read and use the spec,
and more broken HTML would be created and supported. Browsers would
not tend to flag errors as such. SGML-based authoring tools would
become less and less reliable in reading HTML docs...

I hope to see the day when it is more cost-effective to code to the
spec than to reverse-engineer the behaviour of various browsers --
when it is more cost-effective to SGML-validate a document than
to test it on all the browsers you expect your readers to use.

In short, the goal is to make the HTML 2.0 spec the pivot point for
interoperability, and ensure that enhancements to HTML are consistent
with SGML. Toward that end, as much as I'm tired of adding these
"gotcha reports" in the spec, I oppose Terry's suggestion that
informative error handling notes be removed from the HTML 2.0 spec.

Daniel W. Connolly "We believe in the interconnectedness of all things"
Research Technical Staff, MIT/W3C
<connolly@w3.org> http://www.w3.org/hypertext/WWW/People/Connolly