Re: ISO/IEC 10646 as Document Character Set

Erik van der Poel (erik@netscape.com)
Wed, 3 May 95 20:47:02 EDT

>>But there's the installed base and the interoperability currently being
>>enjoyed (yes, even in Japan).
>
>Most companies here don't even know what the Internet is, let alone
>the WWW (they know the *names*, but that's it).

So you're saying that it's OK to destroy the interoperability being
enjoyed by the companies that *do* know what the Internet and WWW are?
Have you asked them how they feel about this?

>The interoperability you refer to is:
>
> "Hey. This squiggly stuff looks like EUC. Now, pull down the Options
> menu, change the font... ahh, now it looks OK."

Netscape 1.1 (Win/Mac) automatically distinguishes between EUC, SJIS
and ISO-2022-JP. The user simply sets a preference for autodetection
(the first time they run Netscape), and Netscape does the rest.

Before Netscape 1.1 hit the streets, (some?) people in Japan were
using delegated, which I believe was also doing autodetection? Correct
me if I'm wrong. So here also, the user would set a preference for
the proxy (to a delegated server), and it would work after that.

Of course, Latin-1 characters would not be displayed correctly in
such cases, but how many Japanese people care about the few such
non-Japanese documents that they come across (most are just ASCII)?

If there is so little interoperability in Japan, how do you explain the
existence of so many documents on the many servers listed in NTT's
list of Japanese Web sites?

>The most popular browsers in Japan are Mosaic
>L10N, Netscape, and a few others. I believe they will all support the
>charset parameter in the near future (if they don't already).

Netscape 1.1 supports the charset header and a few charsets. Does
Mosaic L10N support the charset header?

>>But can we Westerners really dictate what the Japanese should do with
>>their "corner" of the Internet??? Especially since the default is
>>iso-8859-1, which means that we are not impacted.
>
>Yes, you can, and should.

OK, so that's your opinion. Now we need to find out what the Japanese
themselves think.

>The WWW here is just now starting up. The user base is small *now*,
>but in 6 months, it will be much, much larger. At that time, it will
>be a much, much, bigger problem to solve. If we strike *now*, and say
>that servers *must* label the data correctly, and that clients
>*should* send an Accept-Charset field, they will become widespread
>practise.

I hope you're right, but shouldn't we check with the Japanese?

>There is a boom starting, and we have one chance *now* to get it
>right. Let's not blow it.

I agree 200%.

>>It might be a good idea to have clients tell servers that they are capable
>>of parsing the charset parameter. This is similar to Dan's proposal
>>to have clients tell servers that they can do HTML 3.0 (tables, etc).
>
>This is already *in* HTTP. We thrashed around on this 6 months
>ago. Clients *should* send an Accept-Charset(perhaps poorly named)
>field if they can accept anything other than ISO-8859-1, but
>servers *must* label the documents correctly.

The Accept-Charset header is quite different from what I have in mind.
The header I'm thinking of simply tells the server that the client can
*parse* the charset parameter itself. It does not say anything about
which particular charsets are known to the client. This is not really
negotiation. The client *tells* the server to append the charset
parameter, and the server *tells* the client what charset the doc is in.

>>Please let me know what you all think. Perhaps the Japanese should be
>>involved in this discussion?
>
>Why don't you go over to the www-mling mailing list and invite them
>over? Last time anything was Cc'd there, nothing happened.

Yeah, one of the problems is that it is difficult for them to engage
in discussion in English. Can you read/write Japanese? Are you on the
infotalk mailing list at NTT? The www-mling list seems quiet these days.
But infotalk is a bit more active.

Erik