Firstly, thanks Dan for the excellent efforts you have made to represent 
consensus in this spec. I feel that a small item has slipped through, and 
as requested am bringing it to your attention.
Summary
-------
There is a conflict between section 5.1, The ISO Latin 1 Character Repertoire,
and the set of named entities referenced from the DTD. Consensus appeared 
to have been reached in October 1994 to add missing entities but the changes 
do not seem to have made it into the current spec. Proposed changes are 
supplied.
Reassurance
-----------
This email does not suggest adding named entities for any characters 
outside the ISO Latin-1 repertoire. There is no impact on the font 
resources needed by current browsers or rendering capabilities expected 
of them.
Problem description
-------------------
Section 5.1 (p25 of the A4 PostScript version) states:
 The HTML DTD references the Added Latin 1 entity set, to allow mnemonic 
 representation of Latin 1 characters using only the widely supported 
 ASCII character set repertoire.
 
However, the DTD references a collection of entities called
ISO 8879-1976//ENTITIES Added Latin 1//EN//HTML
which only supplies named entities for a subset of the non-ASCII characters 
in ISO Latin-1, namely the accented characters. The remaining characters 
may only be referred to by including their 8bit code positions or by using 
numeric entity references (listed in the non-normative Appendix A).
Thus, either the text in 5.1 should be altered to read
[...] selected Latin 1 characters [...]
which leaves the inconsistency of representation, or (preferably) the 
number of named entities should be expanded, as per previous perceived 
consensus, to include the missing characters. This might be done by
1) referencing an expanded collection of entities with the same name
2) referencing an expanded collection of entities with a new name
3) referencing the old collection of entities, plus an additional collection
4) placing the additional collection in A.3 proposed features
There are good arguments for all alternatives; the group must decide. My 
personal preference would be 2.
Evidence of consensus
---------------------
On Mon, 10 Oct 1994 10:52:16 -0500 Daniel W. Connolly 
(then connolly@hal.com) said in a thread entitled "Perceived Consensus: 
Murray's entity stuff goes in"
 <http://www.acl.lanl.gov/HTML_WG/html-wg-94q4.messages/0048.html>:
> Agreed: if we need names for characters, and there's an ISO entity
> name for the character, we'll use it.
> I'm willing to commit to supporting mnemonic entities for characters
> that are already in the HTML character set (ISO8859-1) like ­,
>  , ¡, «, and such.
On Tue, 11 Oct 1994 10:18:41 -0400 (EDT) Murray Maloney (murray@sco.COM)
said in the same thread:
  <http://www.acl.lanl.gov/HTML_WG/html-wg-94q4.messages/0052.html>
  
> By which I think you mean that if a character is already supported,
> by virtue of it being part of the supported ISO8859-1, then we
> could commit to providing "character entity" support in addition
> to the "numeric character references". This is more specific,
> for ISO8859-1, than I was expecting from the spec. But it is 
> certainly an acceptable "stake in the ground" from my perspective.
Arguments for option 1)
-----------------------
The collection ISO 8879-1976//ENTITIES Added Latin 1//EN//HTML is based on
ISO 8879-1986//ENTITIES Added Latin 1//EN but has been modified already 
to support HTML, so it could be modified some more. 
On Mon, 12 Dec 1994 19:55:53 +0100 Daniel W. Connolly (then connolly@hal.com) 
wrote on www-html in a thread entitled "Baffling math problems [Was: 
HTML 3.0 DTD ]"
  <http://gummo.stanford.edu/hypermail/www-html-1994q4/0152.html>
  
> The Added Latin 1 entity set defines a bunch of names for Latin 1
> characters. The SGML spec appendix that defines it makes no reference
> to the Latin 1 character set (ISO-8859-1). It maps those names to
> these thingies called CDATA entities -- system dependent data
> entities. I believe the intention is that the CDATA entities are
> supposed to be replaced on a per-SGML-system basis. So you might
> see TeX version of "ISO 8879-1986//ENTITIES Added Latin 1//EN", with:
>  <!ENTITY eacute CDATA "\eacute" -- for TeX -->
> Since the document character set for HTML includes all the characters
> referred to by those names, there's no need to use system-specific
> mappings. The entities can be mapped to characters within the document
> character set.
> In response to the same feedback you saw, this set of definitions is
> now called:
> "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML"
Arguments for option 2)
-----------------------
There is a precedent for using a new name for the expanded collection of 
named entities. In Dave Raggetts draft html3.dtd, version 
  Draft: Fri 24-Mar-95 09:46:33 
says
<!-- The HTML list of Latin-1 entities includes the full range
     of characters in widely available Latin-1 fonts, and as such
     is a mixture of ISOlat1 and other ISO publishing symbols -->
<!ENTITY % HTMLlat1 PUBLIC
  "-//IETF//ENTITIES Added Latin 1 for HTML//EN">
%HTMLlat1;
Arguments for option 3)
-----------------------
Minimal changes compared to previous drafts, the changes are localised in 
a separate collection. What do we call it, though, and how do we explain why
the entities are split into two collections
Arguments for option 4)
-----------------------
Not all existing browsers implement all the extra named entities. But then, not
all browsers implement everything anyway. Supporting the extra entities is 
little work. Existing browsers support some of the named entities already.
The missing entities (example, for option 2)
--------------------------------------------
a) Alter the comment block to read (something like):
<!-- Portions of this text are copyright ISO:
     (C) International Organization for Standardization 1986
     Permission to copy in any form is granted for use with
     conforming SGML systems and applications as defined in
     ISO 8879, provided this notice is included in all copies.
-->
<!--	Character entity set. Typical invocation:
	<!ENTITY % HTMLlat1 PUBLIC
	   "-//IETF//ENTITIES Latin 1 for HTML//EN">
	%HTMLlat1;
-->
<!--	Modified for use in HTML
	$Id: ISOlat1.sgml,v 1.1 1994/09/24 14:06:34 connolly Exp $ 
-->
<!--    Modified to add characters not in Added Latin 1 which are in
	the ISO Latin-1 character repertoire, which could only be 
	referred to by numeric references. 
        Also added the standard lt gt amp quot entities from HTML 2.0
        HTMLlat1.sgml Chris Lilley, 13 March 1995
--> 
B) Add these entities:
<!-- 
     Entities that aren't accented characters, and so not in 
     ISO Added Latin 1. Entity names and comments based on relevant 
     entities in
     "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN"
     The four entities umlaut. macron, acute, cedilla
     were not in ISO Numeric and Special Graphic
     either; I took their names from the numeric entity list in
     http://www.hpl.hp.co.uk/people/dsr/html/latin1.html 
     Chris Lilley, 13 March 1995  
-->  
<!ENTITY yuml   CDATA "ÿ" -- small y, dieresis or umlaut mark -->
   
<!ENTITY iexcl   CDATA "¡" -- inverted exclamation mark  -->
<!ENTITY cent    CDATA "¢" -- cent sign  -->
<!ENTITY pound   CDATA "£" -- pound sterling sign  -->
<!ENTITY curren  CDATA "¤" -- general currency sign  -->
<!ENTITY yen     CDATA "¥" -- yen sign  -->
<!ENTITY brvbar  CDATA "¦" -- broken (vertical) bar  -->
<!ENTITY sect    CDATA "§" -- section sign  -->
<!ENTITY umlaut  CDATA "¨" -- umlaut (dieresis)  -->
<!ENTITY copy    CDATA "©" -- copyright sign  -->
<!ENTITY ordf    CDATA "ª" -- ordinal indicator, feminine  -->
<!ENTITY laquo   CDATA "«" -- angle quotation mark, left  -->
<!ENTITY not     CDATA "¬" -- not sign  -->
<!ENTITY shy     CDATA "­" -- soft hyphen  -->
<!ENTITY reg     CDATA "®" -- registered trademark  -->
<!ENTITY macron  CDATA "¯" -- macron  -->
<!ENTITY deg     CDATA "°" -- degree sign  -->
<!ENTITY plusmn  CDATA "±" -- plus-or-minus sign  -->
<!ENTITY sup2    CDATA "²" -- superscript two  -->
<!ENTITY sup3    CDATA "³" -- superscript three  -->
<!ENTITY acute   CDATA "´" -- acute accent  -->
<!ENTITY micro   CDATA "µ" -- micro sign  -->
<!ENTITY para    CDATA "¶" -- pilcrow (paragraph sign)  -->
<!ENTITY middot  CDATA "·" -- middle dot (centred decimal point)  -->
<!ENTITY cedilla CDATA "¸" -- cedilla accent  -->
<!ENTITY sup1    CDATA "¹" -- superscript one -->
<!ENTITY ordm    CDATA "º" -- ordinal indicator, masculine -->
<!ENTITY raquo   CDATA "»" -- angle quotation mark, right -->
<!ENTITY frac14  CDATA "¼" -- fraction one-quarter -->
<!ENTITY frac12  CDATA "½" -- fraction one-half -->
<!ENTITY frac34  CDATA "¾" -- fraction three-quarters -->
<!ENTITY iquest  CDATA "¿" -- inverted question mark -->
<!-- the odd ones tucked in amongs the sequence of accented letters -->
<!ENTITY times   CDATA "×" -- multiply sign -->
<!ENTITY divide  CDATA "÷" -- divide sign -->
<!-- perhaps these should now be here, rather than inlined? -->
<!ENTITY amp     CDATA "&"   -- ampersand          -->
<!ENTITY gt      CDATA ">"   -- greater than       -->
<!ENTITY lt      CDATA "<"   -- less than          -->
<!ENTITY quot    CDATA """   -- double quote       -->
Dan said:
> So in the interest of time, please keep your comments focused. And
> remember: for bonus points: please suggest replacement text! (and
> always excerpt the original, citing the revision date and preferably a
> URL).
I hope I have satisfied these requirements.
-- Chris Lilley, Technical Author +-------------------------------------------------------------------+ | Manchester and North HPC Training & Education Centre | +-------------------------------------------------------------------+ | Computer Graphics Unit, Email: Chris.Lilley@mcc.ac.uk | | Manchester Computing Centre, Voice: +44 161 275 6045 | | Oxford Road, Manchester, UK. Fax: +44 161 275 6040 | | M13 9PL BioMOO: ChrisL | | URI: http://info.mcc.ac.uk/CGU/staff/lilley/lilley.html | +-------------------------------------------------------------------+ | "The first W in WWW will not wait." François Yergeau | +-------------------------------------------------------------------+