CHaracter representation negotiation

Gavin Nicol (gtn@ebt.com)
Mon, 5 Dec 94 21:29:47 EST

As Roy pointed out, if one wants to, one can negotiate for different
characer encodings for HTML with something like the following from a
client:

Accept: text/html; charset=unicode_1_1_utf_7

However, very soon, we will be getting SGML aware browsers (and also
browsers for other document formats). Now we could have a charset=
on each of these different MIME types, but I think we need to get a
single HTTP field allocated for this. In addition, the following are
probably also needed.

1) Either UTF-7 or UTF-8, or both, strongly recommended by both the
HTML and HTTP specs as the way to transmit multilingual documents.
2) A definition of "escape codes" to be used to indicate language and
other such parameters to aid in display purposes. As I have said
elsewhere, such tagging would probably happen automatically, and so
not be visible to the end users.

I think we should look upon thse as "enabling technology". They will
not be immediately used (or at least not widely), but eventually, as
Unicode systems (browsers in particular) become available, they will
be increasingly important.

On top of this foundation, we can then build 2 libraries of great
utility:

1) A library for converting between various characer ancodings, and
the tagged UTF.
2) A library for handling font display using Unicode. This is not
exceptionally difficult.

With these, multilingual browser become, while not trivial, at least
not much more difficult than roman only ones.