Re: partial draft: "Character Set" Considered Harmful

Amanda Walker (amanda@intercon.com)
Mon, 10 Apr 95 10:44:13 EDT

> Incorrect. They are in SGML so that typists could enter characters
> not directly supported by their keyboards. HTML inherits them from
> SGML.

I still think of this as a "quick fix" rather than a desirable feature,
but I can understand why it exists given the history of SGML. This
history notwithstanding, I still feel that the cost outweighs the
benefit. On the other hand, I realize that I'm probably in the minority,
which is why I haven't been taking much part in this discussion.

> We have no choice. They are part of HTML, like it or not.

I have some difficulty with this justification (many things are currently
part of HTML, after all, that are bad ideas :)), but I do understand that
there are pragmatic issues that prevent dropping them completely. I just
think that they should retain the status of "escape hatch."

> Ahh. My proposal is that the document character set for HTML be ISO
> 10646, and hence <EM>all numeric character references would be evaluated
> in terms of ISO 10646</EM>.

This is probably the most reasonable compromise. I certainly agree that
IS 10646 should be used as the document character set (whatever the actual
interchange character set may be in a particular instance). In that
context this is a reasonable way to handle numeric references.

> I will state again that this does *not*
> mean that all browsers must support UCS-2 or anything like that
> (though that would be even better!), but that they must simply
> recognise characters, and classify them according to the roles we
> assign them (informally ASCII or ISO 8859-1 for markup, everything
> else data).

This is fine with me, and is attractive on pragmatic grounds as well,
since IS 10646 has a well-defined (and very simple) relationship to
ISO 8859-1. This is not, of course, a coincidence :).

> Have you read my proposal? For browsers like yours, it is ideal.

I have not had a chance to yet, although from previous threads I think
we're probably of similar minds on many of these issues :). I was just
expressing my view of numeric character references in general...

Amanda Walker
InterCon Systems Corporation