Re: Numeric Char Ents in 2.0 draft

Larry Masinter (masinter@parc.xerox.com)
Fri, 31 Mar 95 03:47:17 EST

I think one of the guidelines for text/*; charset=xxxx mime types is
that agents might actually TRANSLATE the character set to another one.

E.g., it should be legitimate to read something "text/frob;
charset=ISO-8859-6" and just translate it to "text/frob;
charset=UNICODE-1-1-UTF8" on a character-by-character basis, without
actually knowing anything about what "frob" is, except that it is a
subtype of text.

I think this should be true of HTML. In which case, the Character
octet references in HTML should refer to the octets in a _fixed_
character encoding (ISO-8859-1), and not make reference to the base
encoding.

What this means, I think, is that if you want to make "text/html;
charset=xxxx" into something that is conformant SGML, where "xxxx"
isn't ISO-8859-1, you may have to first do a character set
translation.

================================================================

>>>>> On Thu, 30 Mar 1995 19:36:11 -0800, Gavin Nicol <gtn@ebt.com> said:
> Date: Thu, 30 Mar 1995 19:36:11 -0800
> Reply-To: gtn@ebt.com
> Originator: html-wg@oclc.org
> Precedence: bulk
> From: Gavin Nicol <gtn@ebt.com>
> To: Multiple recipients of list <html-wg@oclc.org>
> Subject: Re: Numeric Char Ents in 2.0 draft
> X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
> X-Comment: HTML Working Group

>>I confess I didn't fully understand what was intended by the following
>>until yesterday afternoon. I believe the functionality it describes
>>is not conformant with ISO 8879.
>>
>>| Character octet references are represented in an HTML document as
>>| SGML entities whose name is number sign (#) followed by a numeral
>>| from 32-126 and 161-255. The HTML DTD includes a numeric character
>>| for each of the printing characters of the ISO-8859-1 encoding, so
>>| that one may reference them by number if it is inconvenient to
>>| enter them directly.
>>|
>>| The character octet references are not dependent on the character
>>| set encoding of the document. For example, "&#215;" always
>>| represents the ISO-8859-1 multiply sign, even when the document's
>>| declared character set is other than ISO-8859-1.

> Yes. This is non-conformant, as Francois was quick to point out. I
> noted that I have a possible solution to this problem in my newest
> paper (almost done).