Re: Proposal: Document use of control characters

"Daniel W. Connolly" <connolly@oclc.org>
Date: Thu, 16 Jun 94 13:13:46 EDT
Message-id: <9406161713.AA02297@ulua.hal.com>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: "Daniel W. Connolly" <connolly@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: Re: Proposal: Document use of control characters 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group
In message <9406161206.aa11328@dali.scocan.sco.COM>, Murray Maloney writes:
>
>Proposal:  Identify the control characters in ISO 8859/1
>that are recognized as valid HTML.

Seconded.


> identify those
>control characters which are not valid by specifying
>them as SHUNCHARs in the SGML declaration, and document
>them in the HTML specification.  For each control character
>that is valid, identify its meaning and potential uses.

I'd say this list is:

	Decimal		ASCII "code"	HTML Meaning
					in BODY		in PRE
	9		HT		word break	col := (col+8) mod 8
	10		LF		word break	col := 0; row := row+1
	13		CR		word break	col := 0

Hmmm... about CR and LF in PRE... what about Mac generated documents
taht only use CR vs. unix that only uses LF vs. DOS that uses CRLF?
The above definition works for unix and DOS, but not Mac. Is that
OK for everybody?

>For all control characters which are not valid, list
>the characters and their codes, and specify the error
>(if any) which may result if the character is discovered.

Perhaps somebody could run some tests on existing browsers to see
whether it's reasonable to say whether other chars 0-8, 14-31 should
be ignored altogether or treated as wordbreaks.

Also... do we leave open the possibility that folks will want to use
these unused characters for special purposes (such as graphic code set
switches) in the future?

>The following two characters fall within 8859/1.
>Their behaviour should be specified in the standard.
>
>	160	nbsp		no-break space
>	173	shy		soft hyphen

There has been &nbsp; entity in the spec for a while. Would somebody
run some tests to see if it's supported in the various browsers? If
so, we'll call it standard. If not, it'll have to stay in "proposed
purgatory" for now. Oh... and could we run some checks on &quot;
while we're at it?

I've been wondering what &nbsp; should stand for -- I thought it
would have to be SDATA or a processing instruction or some such. I
had no idea there ware character positions assigned to "non-breaking
space" and "soft hyphen". So the proposed DTD fragment is:

	<!ENTITY nbsp CDATA "&#160;">
	<!ENTITY shy  CDATA "&#173;">

Dan