Re: charset parameter (long)

yergeau@alis.ca
Fri, 13 Jan 95 13:58:31 EST

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: yergeau@alis.ca: "Re: [www-mling,00179] Re: charset parameter (long)"
Previous message: pandries@alis.ca: "Re : proposed changes to charset parameter"

Bruce Kahn <Bruce_Kahn@iris.com> writes:
>Francois Yergeau wrote:
>>>One idea, is for a HTML <charset> tag that would take precedence over the
>>>MIME header:
>>
>>I like this, but will it fly? What about multi-charset documents?
>
> Not very well if the text until <charset=xxx> is in one charset and the
>rest is in another. In order for a browser to grok the charset entry it must
>be able to parse to it.

That's very clear, but is not a problem for any of the numerous charsets that
have ASCII as a subset (like Latin-1). The parser sees only ASCII until it
reaches <charset=xxx> and then knows how to display the document (in Cyrillic
instead of Latin 1, say). No need to switch charset here, just continue in
what you started with.

I think this would also work with ISO-2022-JP and the like; ASCII is not
really a subset there, but the document starts in ASCII until told otherwise.
Perhaps some input from Japan would be helpful here.

> Given that, this scheme would require the authors to either write in two
>different character sets (one for the page and one for <charset>) or we would
>have to hack the scheme to be something too gross to consider.

It seems to me that one can go a long way without involving two charsets.
Much further than with Latin-1 only, with the whole ISO-8859 series plus
possibly ISO-2022 plus, thinking about it, Unicode-1-1-UTF7.

Why then is the HTML draft so restrictive? Larry Masinter's proposed changes,
while a nice opening, still restrics to Latin-1 (section 2.16), which doesn't
make for a very *World* Wide Web.

>Also, do we really want to get into the business of multi-charsets w/in 1
>document??

Emphatically yes!

>I hope not otherwise all the discussion on a header line with the desired
>charset for negotiating on a perfered format is for
>nothing. (I ask for a document in EUC but it has JIS or SJIS intermixed; how
>could I grok those parts?)

First thing, the different charsets have to be identifiable, and that means
tagging.

> I think providing the character set information is better left to
>negotiation between the browser and client (as discussed so far).

It has also been pointed out that the server needs to know what to tell the
client, and Bob Jung proposed the <charset> tag just to help with that. No
need for parallel trees or other, grosser schemes if your documents identify
themselves.

>I like Dans suggestion about having the preference rating but Im not sure
>how useful it would be over say sending multiple accept-charsets in order of
>preferece (ie: 1st is the most prefered, the last is least prefered).

HHTP/1.0 specifies that the ordering of header field is not
significant; I suppose that could be changed though. However, this
simpler scheme does not easily allow combining charset priorities
with Accept: and Accept-Language priorities.

-- 
François Yergeau  <yergeau@alis.ca>
Alis Technologies Inc.
+1 514 738-9171

Next message: yergeau@alis.ca: "Re: [www-mling,00179] Re: charset parameter (long)"
Previous message: pandries@alis.ca: "Re : proposed changes to charset parameter"