Re: HTML 2.0 LAST CALL: Numeric character refs

Terry Allen (terry@ora.com)
Sat, 3 Jun 95 12:19:16 EDT

Dan:
| In message <9506030739.ZM2596@dmg.west.ora.com>, "Terry Allen" writes:
| >There is great advantage in being able to use SGML conformant tools
| >to process HTML.

Actually, that's the real Terry Allen, not "Terry Allen".

| Yea verily!
|
| > However, our experience to date shows that error
| >handling is invoked to make end runs around the agreed-upon DTD
| >and sdecl (unknown elements, unknown atts, NCRs), all of which
| >impose non-SGML requirements on HTML processing systems, and
| >break tools that are ready to hand.
|
| Sing it, brother. I've heard this tale of woe many times. I (we?) have
| heard reports from engineering organizations that started writing an
| HTML parser by using the SGML and HTML specs, and then did some
| real-world testing and added all the error-handling cases. The error
| handling work was reportedly twice as much work as the original
| implementation.

Why does that justify breaking the tools? Especially, in this case,
as we have only to wait for the next version to handle this matter?
and as what is being proposed isn't current practice (even now),
the touchstone for this version?

| >I suggest that all material on error handling should
| >go not in the standards-track HTML 2.0 spec RFC but in an informational
| >RFC devoted to the issue.
|
| I'm afraid this would have the opposite effect from what you (and I)
| seek: the current circumstances arise from the fact that there
| effectively no HTML spec -- users go to their browsers for the last
| word (and to some "HTML How-To" documents for the first word), and
| implementors go to the Mosaic 2.4 source code, or they somehow reverse
| engineer the behaviour of Netscape. Hence there is a lot of HTML out

So what? We're describing an SGML application here. That people
don't want to consult the DTD is no reason to subvert it through
"error handling."

| there that can't be described in a way that's consistent with SGML at
| all, let alone SGML as we'd like to use it.

Yes, these are invalid docs. We have no obligation to describe them.

| If the HTML 2.0 specification did not include informative notes
| telling implementors what to look out for, a few of them would
| code to the spec and find it so out of touch with reality that
| they would disrecommend it to their peers. A few authors would

These "error handling" passages in what is supposed to be a spec
for a conforming SGML application are themselves out of touch
with the reality of SGML processing, and are more than enough
to disrecommend it. Implementors who want to read only the
prose and not the DTD and sdecl (which is what I infer you mean by
"coding to spec") have only themselves to blame.

| read the spec and wonder why it doesn't match the intuition
| they've built up using Mosaic etc.

I don't see what intuition has to do with it. Aren't these
folks supposed to be professional programmers who can read
a spec? even an ISO standard?

| The result would be that even fewer folks would read and use the spec,
| and more broken HTML would be created and supported. Browsers would
| not tend to flag errors as such. SGML-based authoring tools would
| become less and less reliable in reading HTML docs...

Anything can be described as "broken HTML;" we're specifying
conformant SGML here. If you want to standardize "okay broken HTML,"
which is the effect of adding more "error handling" clauses in
the spec, you're doing something entirely different.

| I hope to see the day when it is more cost-effective to code to the
| spec than to reverse-engineer the behaviour of various browsers --
| when it is more cost-effective to SGML-validate a document than
| to test it on all the browsers you expect your readers to use.

Then stop creating trap doors underneath the DTD and sdecl, stop
creating excuses for implementors not to conform. Insert language
describing processing instructions--the SGML-conformant way to
get forward compatibility.

| In short, the goal is to make the HTML 2.0 spec the pivot point for
| interoperability, and ensure that enhancements to HTML are consistent
| with SGML. Toward that end, as much as I'm tired of adding these
| "gotcha reports" in the spec, I oppose Terry's suggestion that
| informative error handling notes be removed from the HTML 2.0 spec.

I fail to see what the gotcha is. Is it a gotcha that you can't
do what the spec doesn't say you can? We agreed that 2.0 was to
deal only with 8859-1. That's a limitation, sure, but a gotcha?
Is it going to be a gotcha later on when (if) we adopt 10646
as the document charset and people try to use NCRs mapped to
some other charset to represent Chinese characters? Or that
just going to be wrong?

I'd like to make an appeal to process. We talked over this issue
at length. Language in the spec was changed repeatedly, until
discussion died down. Now at the last minute one person requests
a new feature, you add it in when you could easily say "too late,"
no one else supports it, several people object, and you refuse to budge.

Now just what is the point of us spending months discussing these
matters if our discussion carries no weight? How do you expect to make
any further progress this way? How do you expect the resulting
spec to recommend itself to the world if the work of the WG
gets thrown to the winds at the last minute?

A final note: in a later message you say
>The June 2 verbage is motivated by
the fact that several widely deployed browsers behave that way.

Which ones? both Mosaic and Netscape display &#54321; as "1".
This is *new feature,* not current practice.

-- 
Terry Allen  (terry@ora.com)   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
occasional column at:  http://gnn.com/meta/imedia/webworks/allen/

A Davenport Group sponsor. For information on the Davenport Group see ftp://ftp.ora.com/pub/davenport/README.html or http://www.ora.com/davenport/README.html