Re: Characters in range 128-159, incl.

Stan Newton (newtonjs@char.vnet.net)
Thu, 26 Jan 95 13:50:36 EST

Dear Murray,

Thank you for your thoughtful reply. This is a more complicated issue than
I thought.

>... if these characters are defined as SHUNCHARS

They don't seem to be. The Declaration in the spec lists only chars from
the range of 127 and below in the SHUNCHAR list. If they really are UNUSED,
then perhaps this list should be expanded in the spec.

If would appear that if my objective was to produce HTML 2.0 compliant
documents, then the characters in this range would have to be removed,
either by substitution to some other valid character or outright removal.
Otherwise, it seems I could have at least two problems.

First, on some systems, one of these characters could be interpreted as a
control code (bypassed perhaps if escaped).
Secondly, even if processed without error, the browser may not be able to
convert the entity into a meaningful glyph.

>Did you try it on non-Windows browsers? I suspect -- maybe Corprew
>could verify -- that Mac browsers have a slightly different
>mapping and that most/all? UNIX browsers do not handle these codes.

I have only tried Windows browsers so I don't know how these others would
fare. It's interesting that the two I tried both correctly converted the
&146; back into the single close quote.

>I think that it would be a good idea for browsers to agree on
>a convention for displaying a consistent symbol -- TBD -- when
>a character code is found in an HTML document for which there
>is no corresponding glyph available for display. In the olden
>days, we used to use the DEL glyph -- a gray or solid box --
>for this purpose on character terminals. But that doesn't help
>you right now.

True. But this does seem like a good idea. Windows does something similar
and it can be very helpful in detecting character problems.

>Anyway, the bottom line on character codes is that there is
>no agreement among the various OS environments as to the
>character code to glyph mapping in the upper 128 characters.
>... someone using an editor on Windows and Mac will
>each insert a different code for the same glyph and
>you won't be able to determine which is which unless
>you know the source, which will probably renew usage
>and add new meaning to the epithet "Consider the source". ;-}

Hence, I gather, the origins of the named sequences of Section 2.17
which could be remapped locally to the corresponding glyph, as long as it
existed within the font.

> you've got the various numeric entities that go beyond what is in
> the Latin 1 character set.

I don't follow you here. I thought the character and numeric entity
references were alternative expressions for characters that existed in the
character set. Are you referring to alternative charsets which would be
specified in some way?

New question:
Does the HTML 2.0 specification REQUIRE that characters above 127 be
converted into one of the alternative entity references? Or, is this merely
an alternative representation to be used for convenience during authoring?
I understand the need to REQUIRE the special chars such as '>'. But, does
the requirement extend to those chars above 127?

Thanks again, Murray.

Stan Newton
Newton Computing Solutions