I sincerely hope Larry's original wording saying "ISO-Latin-1 only" was not
included.
Just browsing the net for a couple of hours, I found sites serving documents in
ISO-2022-JP (Japanese), EUC-JP (Japanese), ISO-8859-5 (Russian), KOI8 (Russian),
KSC5601 (Korean), ISO-8859-8 (Hebrew) and ISIRI-3342 (Persian). URLs upon
request.
Some servers send a "proper" charset parameter with the text/html content type,
others rely on ISO-2022 escape sequences, and the rest just cross their fingers.
All this is not really complicated: just a few character sets, with still fewer
encoding methods. Complexity grows with the number of codesets, however, and
they are numerous: 2 or 3 dozens just in the Middle-East, and 16 pages or so
listed in RFC 1700. I would venture that it is better to deal with the issue
now, and perhaps have some influence on thre proliferation of codesets, than to
say ISO-Latin-1 only and have to deal later with an unpalatable fait accompli.
At the very least the upcoming RFC should not play ostrich, and recognize the
situation so as not to be completely obsolete even before it appears.
-- Francois Yergeau <yergeau@alis.ca> Alis Technologies Inc., Montreal +1 514 738-9171