Re: iso 8859 or escape sequencies?

Axel Belinfante <Axel.Belinfante@cs.utwente.nl>

Mail folder: WWW Talk Apr 94-present
Next message: David Bianco: "Re: Interest in HTML Conformance?"
Previous message: Chris Lilley, Computer Graphics Unit: "Re: iso 8859 or escape sequencies?"
Maybe in reply to: Daniel W. Connolly: "Re: iso 8859 or escape sequencies? "
Reply: Daniel W. Connolly: "Re: iso 8859 or escape sequencies? "

Errors-To: listmaster@www0.cern.ch
Date: Tue, 12 Apr 1994 16:53:22 --100
Message-id: <9404121450.AA17505@utis179.cs.utwente.nl>
Errors-To: listmaster@www0.cern.ch
Reply-To: Axel.Belinfante@cs.utwente.nl
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Axel Belinfante <Axel.Belinfante@cs.utwente.nl>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: iso 8859 or escape sequencies? 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 2904

Chris Lilley <lilley@v5.cgu.mcc.ac.uk> writes:
> I disagree. It may not be the default character set, but ISOLatin1 is
> defined as the character set that HTML uses. It is transfered in 8 bit
> mode, so it arrives intact. Browsers which use some other character set
> in the hope that common letters will occupy the same cose positions are
> therefore, as I see it, broken.
> 
> If a particular platform does not use ISOLatin1 and does not have a font
> that uses the ISOLatin1 encoding, it is up to the browser to do something
> sensible about it. Naively using a different encoding is not something
> sensible. Using code mapping tables, overstrike, and so on is.

This (the use of mapping tables) would be very nice, especially if a
representation 'string' for a character could consist of more than a single
character, and be adjusted by the user of the browser, such that eg. the
already famous &oumlaut; (shouldn't that be &ouml; by the way?) could be
mapped onto `o"' or `"o' or (for tex people :-) `\"o'.

This would be _very_ nice for characters in the other iso-8859-* sets,
which are currently hopelessly left in the dark -
(as far as i know - but writing this i do recall reading a note from the
 writer of the emacs www browser in which he wrote that it should understand
 (now or r.s.n.) all(?) iso-8859-*(?) entities)
the only 'real' solution i know of is the patch that makes
Mosaic automagically display non-iso-8859-1 pages using the appropriate
fonts, but this solution has the disadvantage of not being in the main
distribution, which means that 'the general Mosaic user' likely won't have
it, which means that the 'trick' implemented by the patch cannot be used
for pages that `Joe random-WWW-browser user' should be able to read.

The charset related discussion i have seen so far mostly focusses on
iso-8859-1; are there standard solutions for the use of non-iso-8859-1
charsets? For some languages, (like Esperanto) the use of entities should/
could be sufficient, but for others i suppose it will be harder.

Wrt the browsers that are/will be able to handle non-iso-8859-1 characters:
do they indicate this in (eg.) some sort of Accept: header?
Or can i just assume that they at least can handle the iso-8859-* entities?
As my non-iso-8859-1 interest is mostly limited to the Esperanto characters
in iso-8859-3, such a header would allow me to adapt my server to return
either a iso-8859-3 document, or an iso-8859-1 with entities for the
Esperanto characters, or an iso-8859-1 document with ascii representations
for the esperanto characters - the latter one would be (and currently is)
the default.

Axel.

<Axel.Belinfante@cs.utwente.nl>   tel. +31 53 893774   fax. +31 53 333815
     University of Twente, Tele-Informatics & Open Systems Group
       P.O. Box 217    NL-7500 AE Enschede      The Netherlands
     "ili ne sciis ke estas neebla do ili simple faris" -- Loesje