Re: ISO/IEC 10646 as Document Character Set

Glenn Adams (glenn@stonehand.com)
Wed, 3 May 95 22:30:12 EDT

From: erik@netscape.com (Erik van der Poel)
Date: Wed, 03 May 95 18:25:40 -0700

>We could say that the default encoding scheme is base line "ISO-2022".
>...
>How does this compromise sound? Brain dead or what?

ISO 2022 actually does allow you to include info about the charsets
used in a document, but then we would be tied to 2022, and might end
up excluding other charsets (Big5? KOI8?). Which might be a good
thing?

It might be better to stick to the charset parameter, as defined by MIME.

I'm not arguing for 2022. If it were up to me, I'd specify UTF-8 as
the default.

The point is:

1. We need language in the RFC that specifies what to do in the default
case that the CHARSET parameter is not present in the Content-Type response.

2. If we specify 8859-1 that that may make ISO-2022-JP people unhappy.

3. If we specify ISO-2022-JP, it will make the the majority of users unhappy.

4. If we specify ISO-2022-EU (to give a name to the default state of
ISO 2022 I previously described -- here E means Europe and U means US),
then we essentially achieve a superset of simple 8859-1.

That is, we specify an 8-bit code environment which starts out with
8859-1 being designated and invoked into GL and GR, and which, at the
same time, allows for code switching to any other code set (including
BIG5 and KOI8) via the standard 2022 mechanisms.

It's "How do we get from the current situation to one where the charsets
are labelled?" This is the pressing issue that I think Amanda is also
concerned about.

You bang on server providers and client providers to support interpretation
of the CHARSET parameter. That *is* the prescribed mechanism. Alternatively,
give the user the option of choosing between a number of default encodings.

All I'm saying is, let's exercise some caution before blindly getting
our servers to append the charset parameter to the content-type line.

If we decide that we're willing to accept any pain and suffering caused
by introducing the charset parameter blindly, then that's OK too. As long
as we consciously decide to do so.

I recall quite clearly when the ARPANET converted from NCP to TCP/IP. I
was the host manager at a site which had 4 hosts connected to the ARPANET
(that was a large number of connected hosts in those days).
The way it worked is that a date was decided as the cutoff date for
switch over; if you didn't have TCP/IP up by then you were just out of luck.

We should do this for the CHARSET parameter. Do you have any other
suggestions? A phased-in plan perhaps. Let's just agree to do it and
do it.

Regards,
Glenn Adams