Re: Displaying Control Characters

Daniel W. Connolly (connolly@beach.w3.org)
Thu, 20 Jul 95 11:20:47 EDT

In message <199507190413.VAA09748@blob.best.net>, "Peter K. Sheerin" writes:
>On 18 Jul 95 at 22:31, Daniel W. Connolly wrote:
>
>> * in volation of HTML spec, orthogonal to SGML
>> (e.g. octet 161, 162, ... in a text/html body)
>...
>> Disregarding octets in a text/html body can be construed as an
>> error handling techique. So can aborting with an error in that case.
>
>This is really the thing I'm griping about. I see a hole in the spec that
>different browsers are treating in different ways. This is confusing.
>Browsers are currently mapping entity numeric references in the range of
>control characters directly to printable characters in the same code
>positions in the local character set.

This is not a hole in the spec. The documents in question do _not_
conform to the spec. Hence, implementations may do whatever they
choose.

It's getting _really_ boring and tedious (not to mention hazardous to
the future extensibility of the web) specifying how HTML user agents
should treat all these error conditions.

> I feel strongly that this should
>not be the case, and that we should add some simple language to the spec
>specifying how browsers should handle these small characters.

Please suggest something. I think you'll find it difficult to craft
language that's useful but not overly restrictive.

>> I don't have the SGML spec handy

>I don't have it handy either, and while I will have to soon, I don't
>really want to have to refer to it for something simple like this. Why
>should I have to?

Life is hard. HTML is an application of SGML. Get used to it.

>The way character sets and code pages have evolved has made the current
>usage of characters and fonts extremely confusing. This leads to lots of
>errors in implementation, and confuses the heck out of users and browser
>developers alike.
>
>Let's get rid of this confusion now, so that the path towards
>internationalization is easier.

I agree there is confusion in most discussions regarding the term
"character set." That's why I wrote "'Character Set' Considered Harmful".
I hope that it will clarify the terminology, at least as it's used
in the HTML and MIME specs.

>And yes, I am going to write up some proposed language to deal with this,
>and see how people react.

I suggest you do.

Dan