Re: charset parameter (long)

yergeau@alis.ca
Fri, 13 Jan 95 18:09:33 EST

Larry Masinter <masinter@parc.xerox.com> writes to list html-wg:
>There was a lot of confusion about the restriction that the current
>draft describes level 0,1,2 and only is compatible with
>charset=ISO-8859-1 and charset=US-ASCII. I thought, though, that we
>were agreed that level 2 would describe roughly "current practice"
>that was common throughout the web, and that it would require level
>2.1 or some such to allow other charset parameters.

"Current practice" need not be *throughout* the Web; only ASCII would
qualify, not even Latin-1. "Current practice", however, seems to include
ISO-2022-JP, and possibly some others. If current practice shows that it
works, why be overly restrictive?

>#>Also, do we really want to get into the business of multi-charsets w/in 1
>#>document??
>
># Emphatically yes!
>
>Emphatically no!

Perhaps I was not explicit enough. I don't think the HTML2 draft
should address that - it's too early, and the draft needs to be
approved before hell freezes over. But it still needs to be
addressed, IMHO.

>Trying to handle character-encoding at the SGML level is a mistake,
>because the SGML tags themselves are represented in the overall charset.

Actually, only in the ASCII subset of the overall charset, which buys you
some freedom. If ASCII remains ASCII (same bit pattern) across charset
switches, you're safe, your parser will still recognize your tags.
Otherwise, I agree that there is a problem, that needs to be addressed at
*some* level.

-- 
Francois Yergeau  <yergeau@alis.ca>
Alis Technologies Inc.
+1 514 738-9171