Re: Charset parameter [Was: Tentative Agenda for IETF meeting ]

Gavin Nicol (gtn@ebt.com)
Fri, 2 Dec 94 20:15:22 EST

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Larry Masinter: "Re: Charset parameter [Was: Tentative Agenda for IETF meeting ]"
Previous message: Gavin Nicol: "Re: Tentative Agenda for IETF meeting"
Maybe in reply to: Daniel W. Connolly: "Charset parameter [Was: Tentative Agenda for IETF meeting ]"
Next in thread: Larry Masinter: "Re: Charset parameter [Was: Tentative Agenda for IETF meeting ]"

>There are techniques where the switch between character encoding modes
>is encoded in the byte stream, not in any SGML markup.
>
>It looks something something like:
>
> ESC-$-!(imagine this is arabic)ESC-$-)back to latin-1
>
>so the bytes->chars interactions are all within the entity manager.
>But it means that the "sequence of characters" data that passes from
>the entity manager to the SGML parser has to be able to represent the
>union of the arabic character set and the latin-1 character set.
>

Right. It simplifies things a lot to have the entity manager resolve
such issues, which is precisely why I proposed UTF-8 or UTF-7 as the
"core" encoding that browsers should understand. These encodings of
Unicode represent a reasonable overhead, and Unicode provides at least
a reasonable lowest common denominator.

In addition, if you recall my original model:

Machine #1 Network Machine #2
SJIS----------->UTF---------->EUC

Then by having a Accept: charset, we allow this, and we allow the DCE
model where we can skip the intermediate encoding if the 2 systems can
converse in a common encoding.

Now we do have some problems with glyph mappings in Unicode (as I'm
sure most people are aware of). Given the model I propose above
however, we could use one of the extensibility area codes to signal an
upcoming language hint (Chinese, Japanese, Korean), which would be
followed by some encoding (probably ISO?) indicating the language for
the text following. This has the benefit of not requiring human
intervention (the SJIS->UTF conversion engine could do this "tagging"
automagically), but it also does not preclude it. This could probably
be used to handle zenkaku and hankaku as well (I think). Perhaps this
will also suffice for the other languages as well?

Anyway let's vote "yes" to

Accept: charset=xxxxxxx

This will solve many problems with the WWW in Japan, and will ease
future interoperability. If no-one else volunteers to do the editorial
work, I will do it.

Next message: Larry Masinter: "Re: Charset parameter [Was: Tentative Agenda for IETF meeting ]"
Previous message: Gavin Nicol: "Re: Tentative Agenda for IETF meeting"
Maybe in reply to: Daniel W. Connolly: "Charset parameter [Was: Tentative Agenda for IETF meeting ]"
Next in thread: Larry Masinter: "Re: Charset parameter [Was: Tentative Agenda for IETF meeting ]"