Re: Putting the "World" back in WWW...

Chris Lilley, Computer Graphics Unit (lilley@v5.cgu.mcc.ac.uk)
Tue, 4 Oct 1994 12:23:56 +0100

Frank Rojas wrote in message <9410032259.AA23180@nlsarch.austin.ibm.com>

> > From: hallam@dxal18.cern.ch (HALLAM-BAKER Phillip)

> > It is simply another content encoding to deal with.
> > A charset module can easilly be written to convert fairly arbitrary
> > encodings
> > into UNICODE tokens. This can also do UTS, ASCII, ISO-8893, JIS, and
> > whacky
> > Russian etc. encodings.

> I'm not sure I follow... excuse me if I missed the point ... but it sounds
> like you are suggesting we put "ANY ENCODING" in the document and have each
> viewer convert into UNICODE...

The alternative seems to be to force everyone to write all their documents in
unicode, which would give a large increas in server disk space and transfer time
at a stroke.

As I said yesterday, people in other countries already have methods for encoding
the characters of their national languages, and these methods should be
supported.

Put youself in other's shoes - how would you feel if the Web technology was all
Japanese, say, and the instructions said something like:

" To type a letter 'e', use shift control right bracket kanji-something.
On keyboards without a kanji-something, refer to your manufacturers
instructions. Pressing the letter 'e' on your keyboard will not work."

> If so, this will cause MAJOR interoperability problems across the network.

Why? Why would this cause more severe problems than forcing everyone to use
Unicode when authoring documents?

> Expecting every client to be convert to from every possible encoding will
> never work

[Is that "to be able to convert" ?]

Sure it will. We *are* using a common libwww aren't we?

> But this causes a nightmare for system administrators that need to provide
> conversions from any other encoding to UNICODE... and puts the burden of
> conversion on the clients each time the document is accessed rather then on
> the supplier one time.

I appreciate what you are saying, but the picture is not entirely as you present
it for two reasons.

Firstly, not all clients will need to convert. Realistically, many of the
documents using a particular encoding will be read by people also using that
encoding. So, converting to Unicode on the server would impose a burden of two
encodings - to and from the same native encoding that the people are using in a
particular country.

Secondly, the phrase "system administrators" rings warning bells here. Your
mental model seems to be of a technical support team running a server, doing
code conversion on all their documents to a common format, etc. This is the
traditional heavyweight publishing model.

Fine, some servers are like that but not all. Remember that the first Web
browser for the NeXT was also an editor, and remember Tim BL's address at WWW94
describing how important it was that the Web encouraged collaborative,
lightweight publishing. The readers are also the writers. So putting a burden on
the 'supplier' is the same thing as putting a burden on the consumer.

It strikes me that an alternative reading of what you wrote would mean the
'system administrators' are the people who install the clients. If that is what
you mean, then clients using a common code library would use that for the
conversions, so sysadmins would not, individually, have to solve the problem.

--
Chris