I still think of this as a "quick fix" rather than a desirable feature,
but I can understand why it exists given the history of SGML. This
history notwithstanding, I still feel that the cost outweighs the
benefit. On the other hand, I realize that I'm probably in the minority,
which is why I haven't been taking much part in this discussion.
> We have no choice. They are part of HTML, like it or not.
I have some difficulty with this justification (many things are currently
part of HTML, after all, that are bad ideas :)), but I do understand that
there are pragmatic issues that prevent dropping them completely. I just
think that they should retain the status of "escape hatch."
> Ahh. My proposal is that the document character set for HTML be ISO
> 10646, and hence <EM>all numeric character references would be evaluated
> in terms of ISO 10646</EM>.
This is probably the most reasonable compromise. I certainly agree that
IS 10646 should be used as the document character set (whatever the actual
interchange character set may be in a particular instance). In that
context this is a reasonable way to handle numeric references.
> I will state again that this does *not*
> mean that all browsers must support UCS-2 or anything like that
> (though that would be even better!), but that they must simply
> recognise characters, and classify them according to the roles we
> assign them (informally ASCII or ISO 8859-1 for markup, everything
> else data).
This is fine with me, and is attractive on pragmatic grounds as well,
since IS 10646 has a well-defined (and very simple) relationship to
ISO 8859-1. This is not, of course, a coincidence :).
> Have you read my proposal? For browsers like yours, it is ideal.
I have not had a chance to yet, although from previous threads I think
we're probably of similar minds on many of these issues :). I was just
expressing my view of numeric character references in general...
Amanda Walker
InterCon Systems Corporation