Re: Revised language on: ISO/IEC 10646 as Document Character Set

Sat, 13 May 95 21:50:33 EDT

> You did say "... [in order] to reduce the number of characters you need".
> To me that equates to "code space usage." I'm not sure how to interpret
> it in another way. However, it was not done either to "reduce the number
> of characters" or to reduce code space usage (if you deem these to
> be different).

Whatever. Frankly, I could not care less why Han unification was employed in
Unicode, nor do I care that it was employed, nor do I care where the idea came
from. And none of this is really relevant to the matters at hand, as I tried to
point out in my previous response.

This is obviously a hot button item for you, and I apologize for having pushed
it inadverently.

> The basic reasons for performing Han unification is quite simple:

> ... long discussion of origins and reasons for Han unificiation omitted ...

> Since I didn't participate in the earlier MIME discussions, I'm afraid I
> can't comment on how well the facts were represented.

I hate to have to say it, but you have yet to bring up a single point that I
(or anyone else that participated in the earlier MIME discussions) was unaware
of. I'm actually somewhat dismayed that things have changed so little in the
past two years -- I had hoped this all would go away at some point. Ever
the optimist, I guess...

> However, I have had
> plenty of opportunity to communicate with certain vocal participants who
> fail to understand or acknowledge the above principles. I have no objection
> to someone requesting that the above principles be modified according to
> their perceived needs; at the same time, though, I believe the principles
> chosen above were based not so much on theory, but, rather, on well established
> conventions and principles that are currently embodied by existing character
> set standards. I think most Westerners would be surprised if I suggested
> that the letters A-Z used to write French constituted a different script from
> that used to write German, and, that consequently they should be encoded
> separately.

> This is no different from what certain people have suggested
> is required for Japanese vs. Chinese, etc. I happen to reject such an
> argument not on theoretical grounds, but on practical grounds: there is
> no identified need embodied by current character encoding practice that
> would admit to such a marked departure from existing practice.

Nicely put -- this is a better explication of the threads of this argument than
I have previously seen. But aside from the fact that you have put this so
nicely, there's nothing new here -- it has all been said on at least four other
list and occasions that I know of. (If memory serves, the last time it came
from the folks at Taligent, the time before that from some people at NeXT,
before that from ISO folks, and originally from Keld Simonsen around the time
RFC1345 was published.)

> Again, I suspect most readers of this list aren't all that interested in
> this topic (but then again I could be wrong). I've asked for a separate
> list to help filter the traffic from others. But if there's no consensus
> I guess we're stuck with continuing this conversation here.

Such a list has existed for almost two years now -- the ietf-charsets list
(ietf-charsets[-request] I set this list up at the request of
John Klensin (Application Area Codirector). It was intended specifically to
discuss character set issues in the IETF. At one point a character set working
group seemed like a likely outcome. But the IESG decided instead to defer these
matters to the IAB "for further study". And the IAB has been nothing but a
black hole on this matter as far as I can tell.

> In any case, I agree with you that the HTML spec need not require the
> set of processable data characters to be limited to those in the document
> character set.

Great. If this is agreed then we can avoid even the possibility of this entire
argument resurfacing and interfering with the timely advancement of the HTML
specification. This is all I've been after from the beginning.