Re: Numeric char references

Daniel W. Connolly (connolly@beach.w3.org)
Fri, 9 Jun 95 16:47:39 EDT

In message <199506072201.XAA08488@curia.ucc.ie>, Peter Flynn writes:
>
>Can I ask the charset gureaux...
>
> a. do you (we) have any proposed behavior for a client if it finds
> raw characters in the range 128-159?

I don't believe the ISO-8859-1 character encoding scheme allows for
octets in that range. So you've got a broken MIME message entity. The
behaviour of an HTML user agent upon seeing such a message entity
is unspecified. (see recent discussions on www-talk for evidence that
in fact different browsers do it differently on different platforms.)

> b. or if it finds something like &#153; ?

Same thing, except that you've got an error at the SGML level -- in
the markup of the HTML document (i.e. sequence of characters) rather
than an error in the message entity (i.e. sequence of octets).

>BTW, is it true that 8879 does not define numeric encodings, that this
>is done in 8859?

Er... strange terminology, so I can't really answer that question.
I can answer these:

"Is it true that 8879 (SGML) doesn't define character encoding schemes?"
Yes and no. The SGML standard makes some noise about "bit combinations"
and entity managers mapping them to characters. But the HTML 2.0
spec takes its defintion of character encoding scheme from
MIME.

"Is it true that 8879 (SGML) doesn't define numeric character references?"
No. The &#NNN; syntax is specified by SGML as a reference
to a character in the document character set. ISO 8859-1
is often used as the document character set for HTML, so you
have to use that spec to find the character at the given
code position.

> If so, what was the purpose of leaving 128-159
>blank? Just because they're the flip side of 0-31?

Yes, I gather that's why those code positions are not in the domain
of ISO-8859-1.

Dan