Re: Entities

"Daniel W. Connolly" <connolly@hal.com>
Date: Thu, 22 Sep 94 13:23:19 EDT
Message-id: <9409221722.AA18739@austin2.hal.com>
Reply-To: connolly@hal.com
Originator: html-wg@oclc.org
Sender: html-wg@oclc.org
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <html-wg@oclc.org>
Subject: Re: Entities 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group (Private)
In message <9409221024.aa00606@dali.scocan.sco.COM>, Murray Maloney writes:
>
>P.S.  I have noticed one curious thing...  When I use &#11; I get a space.
>ASCII 11 is supposed to be a Vertical Tab (VT), so I find it a bit odd.

Is &#11; the _only_ control character that behaves this way, or is it
the case that perhaps all characters <= 32 act like space in some
browser? Or perhaps the explanation comes from the fact that some
implementation of the C library function isspace(c) returns true for
11.

With regard to the spec, does it matter? 11 is one of the "UNUSED"
characters. The behaviour of a browser when encountering such
a character is not specified, correct?

Should we call them "SHUNNED", "UNUSED", "UNDEFINED",
or is some other term appropriate?

>The full set of characters in 8859/1 is available through 
>numeric character reference except for nbsp.

In what way is nbsp not available? I haven't tested it, but
character 160 in the X fonts is in fact a space characters,
so I expect that it works (out of happy concidence, if nothing
else) on X/Mosaic. And if the Mac and PC browsers are doing
their ISOlatin1 conversion correctly, &#160; should work
there too.

Corprew: you seem to have ready access to these things. Wanna
check this out for us?

>None of the control characters are supported except for 
>09 (HT), 10 (LF), and 11 (VT).
>
>That means that 00-08, 12-31, 127-160, and 215 are outstanding issues.

Isn't 13 (CR) supported?

>The multiply sign currently at #172 is not legitimately part of 8859-1.
>However, the division sign at #247 is part of 8859-1.

Very strange. Oh well...

>> 	27: an escape character for ISO2022 escape sequences?
>> 		(the multi-lingual document issue again...)
>
>We have not declared support for ISO2022 is HTML 2.0 have we?

Not at all. But I thought it might be wise to give it some
special "reserved for future use" status. For example,
Spyglass has announced a supported Japanese version of
Mosaic. I'm curious to know how they represent Japanese
characters in HTML.

>> 	127-159: is there any defined use for these?
>
>Yes, ISO-6429 defines the codes from 128-159.  Seven are undefined.
>The remainder have potential uses in browsers, retrieval engines,
>HTTP, and editors.

Except for perhaps 173 shy, I'd say let's leve these SHUNNED.
Anyway... it's a 2.1 issue if anything.

>>   11(0B):              --UNUSED--
>
>	Hmmm!  Not what I discovered.


Could you elaborate on this? What was the observed behaviour,
and with what browser?

>
>>From 160-191, the names listed are not usable as character entity names.
>These characters can only be used as coded characters or numeric 
>character references.

Correct, for 2.0. For 2.1, we may want to open up the issue
again. I didn't mean to cloud things.


Dan