Re: Characters in range 128-159, incl.

Murray Maloney (murray@sco.COM)
Thu, 26 Jan 95 21:25:12 EST

> Dear Murray,
> Thank you for your thoughtful reply. This is a more complicated issue than
> I thought.
> >... if these characters are defined as SHUNCHARS
> They don't seem to be. The Declaration in the spec lists only chars from
> the range of 127 and below in the SHUNCHAR list. If they really are UNUSED,
> then perhaps this list should be expanded in the spec.

That's too bad. Dan?

> If would appear that if my objective was to produce HTML 2.0 compliant
> documents, then the characters in this range would have to be removed,
> either by substitution to some other valid character or outright removal.
> Otherwise, it seems I could have at least two problems.
> First, on some systems, one of these characters could be interpreted as a
> control code (bypassed perhaps if escaped).

That's less likely than being presented as the wrong glyph on a Mac.

> Secondly, even if processed without error, the browser may not be able to
> convert the entity into a meaningful glyph.

If you stick with the list of entities in the spec,
you should be safe. I think that Corprew tested
the whole set on a variety of browsers and platforms.

> >Did you try it on non-Windows browsers? I suspect -- maybe Corprew
> >could verify -- that Mac browsers have a slightly different
> >mapping and that most/all? UNIX browsers do not handle these codes.
> I have only tried Windows browsers so I don't know how these others would
> fare. It's interesting that the two I tried both correctly converted the
> &146; back into the single close quote.

Liberal in what they accept!? This is a good thing -- sort of!
The problem with going above and beyond the spec is that
people start to expect that it is the correct behaviour.
Then they start getting chauvinistic about it and demanding
that everyone else comply. And you know what happens next,
don't you? :-) (Sorry, I'm getting puchy)
> >I think that it would be a good idea for browsers to agree on
> >a convention for displaying a consistent symbol -- TBD -- when
> >a character code is found in an HTML document for which there
> >is no corresponding glyph available for display. In the olden
> >days, we used to use the DEL glyph -- a gray or solid box --
> >for this purpose on character terminals. But that doesn't help
> >you right now.
> True. But this does seem like a good idea. Windows does something similar
> and it can be very helpful in detecting character problems.
> >Anyway, the bottom line on character codes is that there is
> >no agreement among the various OS environments as to the
> >character code to glyph mapping in the upper 128 characters.
> >... someone using an editor on Windows and Mac will
> >each insert a different code for the same glyph and
> >you won't be able to determine which is which unless
> >you know the source, which will probably renew usage
> >and add new meaning to the epithet "Consider the source". ;-}
> Hence, I gather, the origins of the named sequences of Section 2.17
> which could be remapped locally to the corresponding glyph, as long as it
> existed within the font.

You've got it.
> > you've got the various numeric entities that go beyond what is in
> > the Latin 1 character set.
> I don't follow you here. I thought the character and numeric entity
> references were alternative expressions for characters that existed in the
> character set. Are you referring to alternative charsets which would be
> specified in some way?

There are numeric character entities that are not represented
in Latin 1 -- that's ISO 8859.
> New question:
> Does the HTML 2.0 specification REQUIRE that characters above 127 be
> converted into one of the alternative entity references? Or, is this merely
> an alternative representation to be used for convenience during authoring?
> I understand the need to REQUIRE the special chars such as '>'. But, does
> the requirement extend to those chars above 127?

No requirement to use entities for those characters above 160.
Chars from 127-160 should not be used.