Re: ISO/IEC 10646 as Document Character Set

Alex Hopmann (hopmann@holonet.net)
Sat, 6 May 95 00:30:24 EDT

Gavin Nicol replies to someones comment (Sorry I can't easily find the name
of whoever posted the >> comment).
>>Yes, it will: If I put &#2789 (or whatever) all over my documents, and
>>the HTML 2.0 spec specifies that this document is legal, and yet it
>>doesn't work on 95% of the browsers in existence, then we
>>lose. Granted, there are corner cases of documents where this is the
>
>I will run a few tests to see it this actually does fail (and by fail,
>I mean crash the system, or cause the page not to be displayed). Any
>other behaviour is acceptable using the "mapping from document
>character set characters to system representation is application
>dependent" loophole.
>
>I suspect that this will "work".
Do we all share the same definition of "work" and "broken" here? Let me take
half a step back and say that I don't personally care much either way how
this gets decided. So in an attempt to get everyones position clear, and
bring them together:

Can we agree that a proposed solution is not "broken", "non-functional" and
that it does represent "current practice" if it can be shown to:
1) Not cause any major current browsers(I'll leave the definition to major
current to anyone who cares to worry about it) to crash.
2) Not cause any major current browsers to not display a document.
3) Not require the authors of major current browsers to change their code in
order to be minimally complient with HTML 2.0.

Now, here is something about which I am very unclear, but let me give it a
shot. If we for some reason said "HTML 2.0 shall use ISO10646 although
support for rendering characters beyond the ISO8859-1 subset is not required
for minimal complience", this does not mean that most browsers which
followed the preceeding rule would suddently be able to display Japanese(for
instance). It is likely that people running at least under MSWindows and the
Macintosh who do not have appropriate Japaneese fonts installed, etc, would
not be able to see Japaneese even if the browser was had appropriate
character codes and knew how to display them in the local fonts. Furthermore
it might be true that given the preceeding condition that non-localized
versions of browsers might not even be able to display Japaneese on a system
that supported Japaneese because the browser vendors choose not to include
mapping tables, and other such things which would increase the code size. On
the other hand browsers which were localized would recognize these ISO 10646
characters and be able to display them correctly on localized operating systems.

So my question comes down to, how does this situation apply to the 3
questions I suggested above? Current practice is that browsers are (usually)
not localized. These non-localized browsers continue to not be able to
display non-ISO-8859-1 characters, but still be complient. New localized
browsers now have an "official" way to do their localization support. The
only browsers for which current practice is not described are those which
implement international support(characters beyond ISO8859-1) in some way
other than ISO10646. And I have not been hearing alot of people on this list
suggest that we should standardize on SJIS, etc.

disclaimers: My above discussion is based on what some others have said
currently works in some browsers or not. I have not personally verified
these aspects. I am not a SGML wizard and probably got some of my
terminology wrong. Besides which, I don't really care which way this issue
gets decided and would mostly like to see if get decided some way.

Alex Hopmann
ResNova Software, Inc.
hopmann@holonet.net