Re: New draft: charset, conformance cleanup

Gavin Nicol (gtn@ebt.com)
Tue, 4 Apr 95 12:19:17 EDT

>>I think this represents current practise though: ISO-8859-1 is the
>>document character set, and use of all others is undefined. In the
>>latter case * may, or may not, represent an asterisk...
>
>Once again, current practice is much more diverse than that. Pick the
>Web maps for Japan, Korea or Russia and browse around.

Well, I live in Japan, so I think I have an idea of what goes on
here. It seems to me that for the most part, people are hacking away,
getting things to run, *with no knowledge of SGML at all*. I know for
a fact that the fellow who did the localised version of lynx wouldn't
know what an SGML declaration *is*. How does Mosaic-L10N resolve
numeric character references for example? Try

?@AB

in data containing EUC (JIX NNNN document character set) and in plain
ASCII data. You get the same thing.

>>>..for documents encoded in ISO-8859-1. Documents encoded in other
>>>character sets should use an SGML declaration as close as possible to
>>>this one, in order to preserve SGML conformance.
>>
>>Again, I don't think we can say this for 2.0, because current systems
>>simply ignore this whole can of worms.
>
>Mosaic-L10N does not ignore it. Mule does not ignore it.
>Mosaic-Cyrillic does not ignore it. The new Arabic-Farsi version of
>Mosaic does not ignore it. Since those browsers *have* to deal with
>it, and in the absence of a general solution, I think it is wise to
>recommend that SGML conformance be preserved. It amounts to saying
>that modifying the SGML decl ad hoc, instead of working around an
>insufficient SGML decl and breaking SGML conformance, is the right way
>to do things.

As I noted above, they do not exhibit correct SGML behaviour, unless
one also says that in the process of altering the SGML declaration,
they are also converting numeric character references. My objection
however, is not with the behaviour, so much as the fact that there is
no defined behaviour for them to adhere to. If Mosaic-L10N produced a
series of smileys for the above, it would amount to the same thing.

Francois. Your notes, and work are of immense value, and I have no
objection at all (in fact I rabidly support) the goal you have (I have
been fighting the same battle since last year!). However, we need to
be *very* careful with the wording so that we do not *commit* to any
solution at the moment, and that we do not give open license to
implementors until we *do* have a solution.

That said, I should note that I consider most of the browser behaviour
reasonable, if somewhat without foundation.