Re: Displaying Control Characters

Martin J Duerst (mduerst@ifi.unizh.ch)
Mon, 17 Jul 95 12:10:31 EDT

Peter K. Sheerin wrote on July 14th, 1995:

>On 13 Jul 95 at 11:49, Daniel W. Connolly wrote:
>>Perhaps somebody could run some tests on existing browsers to see
>>whether it's reasonable to say whether other chars 0-8, 14-31 should
>>be ignored altogether or treated as wordbreaks.
>>
>>Also... do we leave open the possibility that folks will want to use
>>these unused characters for special purposes (such as graphic code set
>>switches) in the future?

These are currently used that way, esp. in some Japanese versions.
This affects the ESC character, and maybe other characters in connection
with ISO-2022 (SI, SO,...).
But according to our model, this is a question of the transfer encoding,
and at the level of the SGML document character set, we don't see nor use
them. The document character set is just one large array of characters,
and there is no switching facility, as far as I am informed.

>That leaves open the question of how to specify what to do with control
>characters. Should they be treated as word breaks? Ignored all together?
>Displayed as-is? If the latter, this works OK under Windows, for most of
>them get mapped to the big open "box" character, making it clear to the
>document reader that something is wrong.
>
>But we can't count on that for other platforms. I think we should specify
>some particular handling of those control characters (whose positions in
>the browser's character set don't contain characters which are in 10646).

I guess my preference would be to display these, and any other characters
that we a priori know should not be used, in a way that makes clear to
every document writer at his first attemt that something is wrong.
This is not the same as for characters in ISO-10646, where we know
that they are (or will become) legal contents, but we might not be able to
display them.

Regards, Martin.