I confess I didn't fully understand what was intended by the following
until yesterday afternoon. I believe the functionality it describes
is not conformant with ISO 8879.
| Character octet references are represented in an HTML document as
| SGML entities whose name is number sign (#) followed by a numeral
| from 32-126 and 161-255. The HTML DTD includes a numeric character
| for each of the printing characters of the ISO-8859-1 encoding, so
| that one may reference them by number if it is inconvenient to
| enter them directly.
|
| The character octet references are not dependent on the character
| set encoding of the document. For example, "×" always
| represents the ISO-8859-1 multiply sign, even when the document's
| declared character set is other than ISO-8859-1.
It seems to me that this means that even if I declare the document
character set encoding to be ISO 8859-6 (English/Arabic, I think),
in which 215 means something else (say, the letter mim), an HTML
app is supposed to interpret that × as referring to a character
in another character set (which is available through the named
character reference × anyway). However, an SGML system will
understand that × as "mim". From the SGML Handbook, section
9.5, production 64, ll. 10--13:
A replacement character is treated as though it were entered
directly except that the replacement for a numeric character
reference is always treated as data in the context in which the
replacement occurs.
And when I parse a sample with sgmls (using a different numeric
character entity) this input
<!doctype html system "recon.dtd"[
]>