Re: Perceived Consensus: Murray's entity stuff goes in

"Daniel W. Connolly" <>
Date: Mon, 10 Oct 94 11:52:39 EDT
Message-id: <>
Precedence: bulk
From: "Daniel W. Connolly" <>
To: Multiple recipients of list <>
Subject: Re: Perceived Consensus: Murray's entity stuff goes in 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group (Private)
In message <9410101117.aa18290@dali.scocan.sco.COM>, Murray Maloney writes:
>The "Additional Entity Sets (Proposed)" was presented
>as a way of committing HTML to using the 8879 entity sets.
>I think that we would all agree that we should not adopt
>other entity-naminmg schemes and break one-to-one compatibility
>with the rest of the SGML docu-verse.  N'est pas?

Agreed: if we need names for characters, and there's an ISO entity
name for the character, we'll use it.

>I wonder, then, if there might be some more appropriate part of
>the RFC -- perhaps where the relationship between HTML and SGML
>is described -- to commit HTML to SGML common practice(s) including
>the use of these well-known entity sets.

I'm willing to commit to supporting mnemonic entities for characters
that are already in the HTML character set (ISO8859-1) like &shy;,
&nbsp;, &iexcl;, &laquo;, and such.

I don't think it's wise to suggest that HTML will include all the ISO
entity sets -- ISOnum (frac58, darr, sung), ISOgrk1,2,3 (agr-OHgr),
ISOtech (becaus, bernou, exist, forall) ISOcyr1,2 (Dcy, etc.) -- until
we understand what impact this will have on browsers and web
communications in general.

I'd hate to see some developer on a machine with every font in the
world cook up some design and say "See, it's easy!" and force
everybody else to bloat their browser installations with zillions of

It's likely that if we introduce all these entity sets, browser
implementors will just find clunky translations to ASCII, and
"fidelity of communications on the web," one of the stated goals of
this standard, will suffer.

For 2.1, we need to tackle the multilingual WWW. For 2.0, I don't want
to suggest that HTML is _anything_ beyond ISO8859-1, the 8bit Latin1
character set.