Re: HTML 2.0 LAST CALL: Numeric character refs

Terry Allen (terry@ora.com)
Thu, 1 Jun 95 14:41:22 EDT

Dave Morris writes:
| On Tue, 9 May 1995, Dan Connolly wrote:
| > dwm@shell.portal.com writes:
| > > Yes, that would satisfy me! I want the standard to be clear that it
| > > isn't acceptable to lose content either explictly or as LarryM
| > > illustrated by turning it into a useless character string. With the
| > > above or even simply leaving 
| > Let's get focused here: which parts of the HTML 2.0 spec do you
| > want changed? What wording would you suggest?
|
| In reference to 5/51 postscript#2, page 1333, section 3.2.1, change
| the last sentence of the first paragraph to read:

There is no section 3.2.1 in html-spec.txt. I suppose it's:

Undeclared Markup Error Handling
To facilitate experimentation and interoperability between
implementations of various versions of HTML, the installed
base of HTML user agents supports a superset of the HTML 2.0
language by reducing it to HTML 2.0: markup in the form of
a start-tag or end-tag whose generic identifier is not
declared is mapped to nothing during tokenization. Unde-
clared attributes are treated similarly. The entire
attribute specification of an unknown attribute (i.e., the
unknown attribute and its value, if any) should be ignored.
On the other hand, references to undeclared entities should
be treated as data characters.

This says nothing about numeric charrefs nor should it. As we have
already discussed, numeric charrefs that are not in the document
character set are simply invalid, and with ordinary SGML tools that
respect the document character set, they're not found in the output
of the parse.

| On the other hand, references to undeclared entities
| + and numeric character references which cannot be resolved
| + (e.g., are out of range)
| should be treated as data characters.
|
| The +ed lines are added words, no deletions.

And are not what we want to say here. The language about
numeric charrefs has been carefully crafted. It will be
revised in the next version of HTML that appears after an
internationalization proposal is agreed upon (Gavin, time to
get a move on). At that point we can discuss what "out of
range" might mean. I strongly urge we stay with the
present language here, much as I feel your pain.

-- 
Terry Allen  (terry@ora.com)   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
occasional column at:  http://gnn.com/meta/imedia/webworks/allen/

A Davenport Group sponsor. For information on the Davenport Group see ftp://ftp.ora.com/pub/davenport/README.html or http://www.ora.com/davenport/README.html