Re: Numeric char references

Martin J Duerst (mduerst@ifi.unizh.ch)
Fri, 9 Jun 95 16:48:17 EDT

Peter Flynn writes:

>Can I ask the charset gureaux...
>
> a. do you (we) have any proposed behavior for a client if it finds
> raw characters in the range 128-159?
>
> b. or if it finds something like ™ ?

I think we don't have, and we shouldn't, as for the other cases, with
largely the same arguments. However, the difference is that these
positions are undefined and will remain undefined, so an advisable
reaction of a browser would be to show an error message. If all the
browsers do this, the document writers will discover their errors at
an early stage.
On the other hand, some other encodings (mainly Japanese Shift-JIS)
use such positions in multibyte sequences. If you have a browser that
can handle undeclared encodings by just passing them byte-by-byte
to the display engine, which again puts two and two bytes together
where necessary (many browsers for Japanese *currently* work that
way, but will hopefull change soon), then the above advice of showing
an error message will hamper this inofficial functionality badly.

>BTW, is it true that 8879 does not define numeric encodings, that this
>is done in 8859? If so, what was the purpose of leaving 128-159
>blank? Just because they're the flip side of 0-31?

Yes, they correspond to 0-31. ISO 2022 defines an intricate protocol
by which you can assign control character sets to 0-31 and/or to
128-159, and graphics characters to the two remaining areas.

Regards, Martin.