Re: format nego in HTML/10646?

Dan Connolly (connolly@w3.org)
Mon, 8 May 95 15:55:44 EDT

Glenn Adams writes:
>
> I think this discussion of whether and how to display characters
> from 10646 outside of Latin 1 should be deferred or moved into a
> private forum (perhaps we need a sublist for discussing the I18N
> paper and issues -- I'm sure not everyone wishes to follow this thread).

I recommend against this. The html-wg archive works well, and I like
the fact that it is reasonably complete. (I wish it were searchable,
and I understand this is being addressed...) If the gory details
of the tables discussion are fit for this list, then so are the
gory details of I18N.

> What would be useful at this point is to specify in the current 2.0
> RFC the requirements of a UA in rendering character data.

Agreed. Specific language changes are exactly what we should be focusing
on right now.

> I would
> expect these requirements to read something like:
>
> 1. An HTML 2.0 UA must be capable of depicting all Latin 1 graphic
> characters, that is, those characters in the range 0020 - 007E and
> 00A0 - 00FF, plus SPACE (0020).

I believe this is materially the same as what's in there now:

| It supports the ISO-8859-1 character encoding scheme, and processes
| each character in the ISO Latin Alphabet Nr. 1 as specified in section
| The ISO Latin 1 Character Repertoire. (3)

The "ISO Latin 1 Character Repertoire" goes into gory detail
about which characters do what.

> 2. An HTML 2.0 UA may optionally depict other graphic characters.

What's in the spec is even a little stronger:

|(3)
|
|To support non-western writing systems, HTML user agents should
|support the Unicode-1-1-UTF-8 and Unicode-1-1-UCS-2 encodings and as
|much of the character repertoire of ISO10646 as is possible as well.

> 3. An HTML 2.0 UA which cannot depict a graphic character, e.g.,
> a character reference whose replacement text has no representation
> in the system character set, should depict such a character as either:
>
> - an empty box (i.e., an unknown character glyph)
> - a sequence of characters which represent the reference itself
> rather than its replacement text, e.g., ￰

As this is strictly a suggestion/clarification, I'd like more feedback
before I stick something like this into the spec.

I'm willing to expand the section on "Undeclared markup error
handling" to talk about numeric character references, but I'm not sure
it's worth it. I don't expect to see NCR's used extensively, and I
think that making it seem like non-conforming documents will be
treated the same by all browsers is an effort with diminishing returns
at this point.

I've had feedback in several directions on this: somebody said telling
browsers how to do this is overspecificatoin (and I agree). Others
said the author's intent should be preserved whenever possible. But I
don't want authors thinking that non-conforming documents will work.

> 4. The same treatment (3) is to be given to SDATA entity references
> which cannot be mapped to appropriate system character data.

There are no SDATA entity references in HTML 2.0.

So until I see more feedback, I consider this issue resolved for the
2.0 document.

Dan