Re: HTML/SGML/charsets

lilley (lilley@afs.mcc.ac.uk)
Mon, 3 Apr 95 13:01:46 EDT

Joe English writes:

> The _problem_ is in section 6.3.2, "Character Octet References":
>
> The character octet references are not dependent on the character
> set encoding of the document. For example, "×" always
> represents the ISO-8859-1 multiply sign, even when the document's
> declared character set is other than ISO-8859-1.
>
> This directly contradicts section 3.2 and/or ISO 8879.

I agree with this position - numeric references should refer to the document
character set. But it can be sorted fairly easily. All we need is someone
who can speak Polish.

The major users of non-Latin-1 for HTML in mid-94 were, I believe, the
Polish commmunity who use 8859-2 in HTML documents(without any charset
parameter on the MIME type).. So we just need to search some Polish
sites and check that they do not use any numeric character references to
get 8859-1 characters. I suspect that they will not.

If they do not, then there is no problem and the spec can state that numeric
references refer to the document character set.

> Numeric character references *are* dependent on the character
> set encoding of the document. For example, "×" represents
> the multiply sign *if* the document's declared character
> set is ISO-8859-1.

Seems like a good idea, if it doesn't break existing practice that large
sections of the community are using.

> (Are MIME applications allowed to translate documents
> from one character set to another in transit? That's
> the only way I can think of that this change would
> break MIME.)

Yes, they are. ASCII-EBCIDIC gateways spring to mind.

--
Chris Lilley
+----------------------------------------------------------------------+
|Technical Author, Manchester and North HPC Training & Education Centre|
+----------------------------------------------------------------------+
| Computer Graphics Unit,             |  Email: Chris.Lilley@mcc.ac.uk |
| Manchester Computing Centre,        |  Voice: +44 61 275 6045        |
| Oxford Road, Manchester, UK.M13 9PL |    Fax: +44 61 275 6040        |
+-------------------------------------+ BioMOO: ChrisL                 |
|       URI: http://info.mcc.ac.uk/CGU/staff/lilley/lilley.html        | 
+----------------------------------------------------------------------+
|       "The first W in WWW will not wait."   François Yergeau         |
+----------------------------------------------------------------------+