Re: iso 8859 or escape sequencies?

lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)

Mail folder: WWW Talk Apr 94-present
Next message: Axel Belinfante: "Re: iso 8859 or escape sequencies? "
Previous message: Chris Lilley, Computer Graphics Unit: "Re: iso 8859 or escape sequencies?"
Maybe in reply to: Chris Lilley, Computer Graphics Unit: "Re: iso 8859 or escape sequencies?"

Errors-To: listmaster@www0.cern.ch
Date: Tue, 12 Apr 1994 13:58:10 --100
Message-id: <94041212523472@cguv5.cgu.mcc.ac.uk>
Errors-To: listmaster@www0.cern.ch
Reply-To: lilley@v5.cgu.mcc.ac.uk
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: iso 8859 or escape sequencies?
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 4080

 Bert Bos <bert@let.rug.nl> writes:

> [original attribution missing]
> |Is there a reason to use the html "escape-sequencies" (&oumlaut for | 

That should be &oumlaut; by the way. Note also that the character referred to 
has become corrupted toia vertical bar (in my quoted attribution) by the mail 
software, a problem that HTTP does not have.

> |etc.) for characters that are also in the iso 8859-1 character-set? Are 
> |there browswers that do not support the full iso8859 character set but 
> |do support the escape-sequencies?
> | -Timo H

> There are several reasons:

>- On many computers Latin-1 is not the default character set, so codes
>  above 127 would be mapped incorrectly.

I disagree. It may not be the default character set, but ISOLatin1 is defined as 
the character set that HTML uses. It is transfered in 8 bit mode, so it arrives 
intact. Browsers which use some other character set in the hope that common 
letters will occupy the same cose positions are therefore, as I see it, broken.

If a particular platform does not use ISOLatin1 and does not have a font that 
uses the ISOLatin1 encoding, it is up to the browser to do something sensible 
about it. Naively using a different encoding is not something sensible. Using 
code mapping tables, overstrike, and so on is.

Bert's statement contains a hidden assumption

>- On many computers Latin-1 is not the default character set, 
[ assumption; browsers should/will not do any code translation]
> so codes
>  above 127 would be mapped incorrectly.

That assumption is not correct, IMHO.

If browsers on EBCDIC platforms were to display a, b, c etc incorrectly that 
would definitely be considered brokem. I submit that just because my native 
language happens not to need an a acute or a u umlaut, that is no reason to 
consider these characters any less important, or rendering them correctly to be 
an optional little detail.

>- Using the SGML entities ensures that the file can be e-mailed (see
>  what became of your &oumlaut; above...)

Indeed. So browsers should certainly map these characters to 7 bit clean 
versions, or use quoted printable, or base 64 encoding, or whatever when mailing 
html files from the browser. That is a separate issue, concerned with reusing 
the html file for something other than it's original purpose.

I see no reason to insist that people type these things in. For some people, it 
is part of their language. They do not normally have to type these things in in 
a special way; there are characters on their keyboards with the symbols on, they 
press them, and get the correct letter.

Consider, for example, if a s&letterT;an&letterD;ar&letterD;s bo&letterD;y 
ou&letterT;side your coun&letterT;ry (which happene&letterD; no&letterT; 
&letterT;o use &letterT;he le&letterT;&letterT;ers "t or "d") 
insis&letterT;e&letterD; tha&letterT; you en&letterT;er all &letterT;ex&letterT; 
in &letterT;his way

How inconvenient this would be!!

>- Browsers that cannot display the characters, can -- in principle --
>  approximate them.

Agree absolutely. Whether the character is transferred as an 8 bit 
representation - perfectly valid when the transport is guaranteed to be 8 bit 
clean - or as an entity reference (is that the correct term?) is however 
orthogonal to how the browser choses to represent or approximate them.

--
Chris Lilley
+-----------------------------------------------------------------------------+
| Technical Author, ITTI Computer Graphics and Visualisation Training Project |
+-----------------------------------------------------------------------------+
| Computer Graphics Unit,        |  Internet: C.C.Lilley@mcc.ac.uk            |
| Manchester Computing Centre,   |     Janet: C.C.Lilley@uk.ac.mcc            |
| Oxford Road,                   |     Voice: +44 61 275 6045                 |
| Manchester, UK.  M13 9PL       |       Fax: +44 61 275 6040                 |
| <A HREF="http://info.mcc.ac.uk/CGU/staff/lilley/lilley.html">click here</A> | 
+-----------------------------------------------------------------------------+