Re: New draft: charset, conformance cleanup

Francois Yergeau (
Tue, 4 Apr 95 11:39:10 EDT

>Date: Tue, 4 Apr 95 10:33:38 EDT
>From: Gavin Nicol <>
>I think this represents current practise though: ISO-8859-1 is the
>document character set, and use of all others is undefined. In the
>latter case &#42; may, or may not, represent an asterisk...

Once again, current practice is much more diverse than that. Pick the
Web maps for Japan, Korea or Russia and browse around.

>>Is it a good thing to have different defaults depending on the mode
>This is not a major problem. As Dan pointed out, MIME specifies an
>encoding, rather than a character set.

It specifies an encoding of a character set. Specifying ASCII
encoding also specifies the ASCII character set, and nothing else.
What do I do if I find characters with the MSB set after having
assumed ASCII? Flag an error? It is simpler *and* better to assume
Latin-1 by default; if all characters have the MSB clear, no problem;
if some have the MSB set, you know what to do with them. And yes, 8
bits in the mail happens all the time, despite what the RFCs say.

>Besides, ISO-8859-1 is a
>superset of US-ASCII.

Which is exactly why ASCII does noit need to be specified.

>>> SGML declaration in section 13@@ applies. Other charset parameter
>> ^^^^^^^^^^^^^^^^^^^^^^^
>>> values are reserved for future use.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>This sentence needs to go, and be replaced by the former language that
>>said that for other charsets, the SGML was to be minimally modified.
>I agree that the sentence should be deleted, but I do not agree with
>your last sentence. It should instead say that processing in the face
>of other values is currently unspecified.

Once again, it is better to offer some guidance, however vague and
incompltrete, than none at all.

>>..for documents encoded in ISO-8859-1. Documents encoded in other
>>character sets should use an SGML declaration as close as possible to
>>this one, in order to preserve SGML conformance.
>Again, I don't think we can say this for 2.0, because current systems
>simply ignore this whole can of worms.

Mosaic-L10N does not ignore it. Mule does not ignore it.
Mosaic-Cyrillic does not ignore it. The new Arabic-Farsi version of
Mosaic does not ignore it. Since those browsers *have* to deal with
it, and in the absence of a general solution, I think it is wise to
recommend that SGML conformance be preserved. It amounts to saying
that modifying the SGML decl ad hoc, instead of working around an
insufficient SGML decl and breaking SGML conformance, is the right way
to do things.

>I should note that my paper will be sent out for review today, and
>hopefully, I should be sending it to the list tomorrow or the day

I'm looking forward to reading it.

François Yergeau <>