Re: format nego in HTML/10646?

Gavin Nicol (gtn@ebt.com)
Mon, 8 May 95 03:04:29 EDT

>And if the charset parameter describes only the encoding of 10646,
>it's a possibly weak indicator of the code ranges used.

Yes, this is true, but I think such behavior will represent a border
case anyway. As it is, even now charset=us-ascii document can contain
iso-8859-1 numeric character references.

>In what charset encoding should it serve out these
>106460-encoded-in-iso-2022-jp files?

I would use iso-2022-jp.

>In other words, how is any information communicated that allows format
>negotiation over the question "does the client have fonts to render this
>document"? or if not "fonts", then "reasonable means"?

If a client requests iso-2022, then the data provider should take that
as a strong indication that the system can do Japanese, but little
else.

>or do we have to accept that the world has 65,500 characters that we
>may be called upon to render at a moment's notice?

We should accept this. There are ways of dealing with this, and
indeed, you were one of the first to mention one of them.

>I agree. But as James has pointed out, we can't define out numeric
>charrefs, and indeed they might be useful. Using iso-2022-jp
>as an encoding of Unicode for SGML is not as restrictive as using
>it as a system charset, and that is where suggested shortcuts
>are coming up short.

I wouldn't say the protocol, or the specification is coming up short,
but rather the browsers implementing them. Until everyone can handle all
of ISO 10646, there will be cases where data may simply not be
displayable. I would argue that those cases will tend to represent
a small, and decreasing, minority.

>Right. And if 10646 is to become the doc charset, those will be
>valid charrefs. Fine. I just want to know how I can tell from
>the charset parameter whether I have fonts for all the characters
>in the doc. Certainly if the value is iso-2022-jp that's a strong
>indication that the doc could be rendered with fonts for Japanese,
>but then again anything might be in there via numerical charrefs.

I can understand the desire for this, but as I noted, even today,
there are failure cases with just US-ASCII and ISO-8859-1.

It might be very desireable to include ranges in the charset=xxxx
parameter, and in Accept-Charset field. One advantage of using ISO
10646 is that at least we have one single character set that we can
used ranges from as indicators.

In fact, I think that if (as appears very likely) we *do* move to ISO
10646 at some point, Terrys' idea of specifying ranges should be given
some consideration. However, I do not think that *not* having it
causes enough problems to halt the adoption of ISO 10646.