I'm not sure this is really better. Have we lost the preference (q) parameter?
Furthermore, as Daniel Connolly wrote, the bandwidth necessary to express the
common cases should be minimized and this seems to be a move in the wrong
direction.
>I agree that if the charset parameter is not specified, the default
>***should*** be US-ASCII (or ISO8859-1, if it's been changed).
Changed where? In MIME or in HTTP?
>Unfortunately, since charset was reserved for future use, Japanese servers
>had no choice but to serve non-Latin files without a charset parameter!!
>
>Why don't we enforce the default for servers using a future version of
>the HTTP protocol? ...and let current versions be "implementation
>dependent" in order to preserve backwards compatibility?
I agree that it should stay "implementation dependent" for current
servers, but I'm just not sure it is necessary or wise to *enforce* a
default in the future. What's the point, except for sticking to the
letter of MIME, which has to live with the limitations of mail? Does that
mean that, say, Mosaic-L10N would have to forgo its (limited, but still
useful) automatic codeset detection?
>Daniel>...but I think the overall proposal is incomplete
>Daniel>until we have a workable proposal for how to enhance the major httpd
>Daniel>implementations to correctly label non-ISO8559-1 documents.
>
> [...]
>
>A web site with versions of the same files in different encodings
>(e.g., SJIS, EUC and JIS) or languages (e.g., English and Japanese)
>could create separately rooted trees with the equivalent files in each
>tree. The top page could say click here for SJIS/EUC/JIS or
>English/Japanese.
And how is the top page encoded? In what language? This just shows that
Accept-Language and Accept-Charset are needed,
>One idea, is for a HTML <charset> tag that would take precedence over the
>MIME header:
>
> <html>
> <charset=xxx>
> <head> <title> DOCUMENT TITLE GOES HERE </title> </head>
> <body>
> <h1> MAJOR HEADING GOES HERE </h1>
>
> THE REST OF THE DOCUMENT GOES HERE
>
> </body>
> </html>
I like this, but will it fly? What about multi-charset documents?
>Daniel>But "giving somethign the client didn't ask for" is _not_ an HTTP
>Daniel>protocol viloation (at least not if you ask me; the ink still isn't
>Daniel>dry on the HTTP 1.0 RFC though...). It's something that the client
>Daniel>should be prepared for.
>
>As you put it "It's something that the client should be prepared for."
>
>I'm still assuming that
>
> accept-parameter: charset=xxx
>
>dictates if the server sends back the charset parameter. An old browser
>should continue to get the 2022-JP data untagged. A new charset-enabled
>browser should get tagged 2022-JP data even if it only advertised
>8859-1.
What's the point, if it just advertised that it couldn't deal with
2022-JP? It seems to me that a "new" server (able to tag) should send a
more economical error reply, leaving the browser with the alternative of
1) abandonning (can't read anyway) or 2) reformulating the request. It
could advise the user and ask for guidance. Of course the browser must
still be prepared to receive *untagged* whatever from an old server.
>I agree. I've purposely read EUC and JIS pages on my Mac (SJIS), so that I
>could save the source and look (grok) at it later. (not a usual user...)
Can still be done with 2) above.
>Ken>I want to add one more thing about this issue. We could have the document
>Ken>which uses multiple charset in future. We must define the way to label
>Ken>such a document.
>Ken>It can be like ...
>Ken> Content-Type: text/html; charset="ISO2022-JP", charset="ISO8859-6"
>Ken>Is this OK?
>
>I'd rather leave this as a possible future direction. Multilingual has
>had a lot of heated discussions. If we can agree on a means to support
>the existing mono-lingual mono-encoded Web data, that will allow us
>to create products to fill an immediate need. Can we phrase something that
>leaves this open and discuss this in another thread?
It is the purpose of this (www-mling) list, after all!
Regards,
-- Francois Yergeau <yergeau@alis.ca> Alis Technologies Inc. +1 514 738-9171