There is no section 3.2.1 in html-spec.txt. I suppose it's:
Undeclared Markup Error Handling
To facilitate experimentation and interoperability between
implementations of various versions of HTML, the installed
base of HTML user agents supports a superset of the HTML 2.0
language by reducing it to HTML 2.0: markup in the form of
a start-tag or end-tag whose generic identifier is not
declared is mapped to nothing during tokenization. Unde-
clared attributes are treated similarly. The entire
attribute specification of an unknown attribute (i.e., the
unknown attribute and its value, if any) should be ignored.
On the other hand, references to undeclared entities should
be treated as data characters.
This says nothing about numeric charrefs nor should it. As we have
already discussed, numeric charrefs that are not in the document
character set are simply invalid, and with ordinary SGML tools that
respect the document character set, they're not found in the output
of the parse.
| On the other hand, references to undeclared entities
| + and numeric character references which cannot be resolved
| + (e.g., are out of range)
| should be treated as data characters.
|
| The +ed lines are added words, no deletions.
And are not what we want to say here. The language about
numeric charrefs has been carefully crafted. It will be
revised in the next version of HTML that appears after an
internationalization proposal is agreed upon (Gavin, time to
get a move on). At that point we can discuss what "out of
range" might mean. I strongly urge we stay with the
present language here, much as I feel your pain.
-- Terry Allen (terry@ora.com) O'Reilly & Associates, Inc. Editor, Digital Media Group 101 Morris St. Sebastopol, Calif., 95472 occasional column at: http://gnn.com/meta/imedia/webworks/allen/A Davenport Group sponsor. For information on the Davenport Group see ftp://ftp.ora.com/pub/davenport/README.html or http://www.ora.com/davenport/README.html