Re: Entities

Murray Maloney <murray@sco.COM>
Date: Thu, 22 Sep 94 15:34:40 EDT
Message-id: <9409221429.aa01312@dali.scocan.sco.COM>
Reply-To: murray@sco.COM
Originator: html-wg@oclc.org
Sender: html-wg@oclc.org
Precedence: bulk
From: Murray Maloney <murray@sco.COM>
To: Multiple recipients of list <html-wg@oclc.org>
Subject: Re: Entities
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group (Private)
> 
> In message <9409221024.aa00606@dali.scocan.sco.COM>, Murray Maloney writes:
> >
> >P.S.  I have noticed one curious thing...  When I use &#11; I get a space.
> >ASCII 11 is supposed to be a Vertical Tab (VT), so I find it a bit odd.
> 
> Is &#11; the _only_ control character that behaves this way, or is it
> the case that perhaps all characters <= 32 act like space in some
> browser? Or perhaps the explanation comes from the fact that some
> implementation of the C library function isspace(c) returns true for
> 11.

No, it's just &#11; -- at least in Mosaic 2.4.
Strangely, not &#13;

> 
> With regard to the spec, does it matter? 11 is one of the "UNUSED"
> characters. The behaviour of a browser when encountering such
> a character is not specified, correct?
> 
> Should we call them "SHUNNED", "UNUSED", "UNDEFINED",
> or is some other term appropriate?

I would have thought SHUNNED was appropriate.
Perhaps Terry, Lee or Yuri could help here.
> 
> >The full set of characters in 8859/1 is available through 
> >numeric character reference except for nbsp.
> 
> In what way is nbsp not available? I haven't tested it, but
> character 160 in the X fonts is in fact a space characters,
> so I expect that it works (out of happy concidence, if nothing
> else) on X/Mosaic. And if the Mac and PC browsers are doing
> their ISOlatin1 conversion correctly, &#160; should work
> there too.

Perhaps it is just the system I am on.
When I try &#160; I get nothhing.
No space, nothing.  I thought that you and I had discussed
this a while ago and had agreed that nbsp and shy did
not work.  If y'all tell me that it works on most browsers/systems
then I'll change it.
> 
> Corprew: you seem to have ready access to these things. Wanna
> check this out for us?
> 
> >None of the control characters are supported except for 
> >09 (HT), 10 (LF), and 11 (VT).
> >
> >That means that 00-08, 12-31, 127-160, and 215 are outstanding issues.
> 
> Isn't 13 (CR) supported?

Oops!  I should have read what I wrote.
It is taken as word space in all contexts except <PRE>.
> 
> >The multiply sign currently at #172 is not legitimately part of 8859-1.
> >However, the division sign at #247 is part of 8859-1.
> 
> Very strange. Oh well...
> 
> >> 	27: an escape character for ISO2022 escape sequences?
> >> 		(the multi-lingual document issue again...)
> >
> >We have not declared support for ISO2022 is HTML 2.0 have we?
> 
> Not at all. But I thought it might be wise to give it some
> special "reserved for future use" status. For example,
> Spyglass has announced a supported Japanese version of
> Mosaic. I'm curious to know how they represent Japanese
> characters in HTML.

Sort of like how I want to reserve the 8879 entity name space?


> >> 	127-159: is there any defined use for these?
> >
> >Yes, ISO-6429 defines the codes from 128-159.  Seven are undefined.
> >The remainder have potential uses in browsers, retrieval engines,
> >HTTP, and editors.
> 
> Except for perhaps 173 shy, I'd say let's leve these SHUNNED.
> Anyway... it's a 2.1 issue if anything.

Right.
> 
> >>   11(0B):              --UNUSED--
> >
> >	Hmmm!  Not what I discovered.
> 
> 
> Could you elaborate on this? What was the observed behaviour,
> and with what browser?

With Mosaic 2.4 I got a space in plain text and in <PRE>
> 
> >
> >>From 160-191, the names listed are not usable as character entity names.
> >These characters can only be used as coded characters or numeric 
> >character references.
> 
> Correct, for 2.0. For 2.1, we may want to open up the issue
> again. I didn't mean to cloud things.
> 
> 
> Dan