Right. It simplifies things a lot to have the entity manager resolve
such issues, which is precisely why I proposed UTF-8 or UTF-7 as the
"core" encoding that browsers should understand. These encodings of
Unicode represent a reasonable overhead, and Unicode provides at least
a reasonable lowest common denominator.
In addition, if you recall my original model:
Machine #1 Network Machine #2
SJIS----------->UTF---------->EUC
Then by having a Accept: charset, we allow this, and we allow the DCE
model where we can skip the intermediate encoding if the 2 systems can
converse in a common encoding.
Now we do have some problems with glyph mappings in Unicode (as I'm
sure most people are aware of). Given the model I propose above
however, we could use one of the extensibility area codes to signal an
upcoming language hint (Chinese, Japanese, Korean), which would be
followed by some encoding (probably ISO?) indicating the language for
the text following. This has the benefit of not requiring human
intervention (the SJIS->UTF conversion engine could do this "tagging"
automagically), but it also does not preclude it. This could probably
be used to handle zenkaku and hankaku as well (I think). Perhaps this
will also suffice for the other languages as well?
Anyway let's vote "yes" to
Accept: charset=xxxxxxx
This will solve many problems with the WWW in Japan, and will ease
future interoperability. If no-one else volunteers to do the editorial
work, I will do it.