Numeric character references and ISO-2022 (was: ISO/IEC 10646 as

Ned Freed (NED@SIGURD.INNOSOFT.COM)
Sat, 6 May 95 17:20:50 EDT

A question, if I may, about numeric character references and ISO 2022.

My understanding of these numeric character references is that refer to code
positions in the document character set, which must be a coded character set.
This makes all sorts of sense in a character set like ISO-8859-1 or ISO 10646,
which satisfy the criteria for being a coded character set. (That is, there a
function exists that maps a subset of the integers into the specified character
repetoire.)

But what about character sets based on ISO 2022?

My initial understanding was that such character sets cannot be used as
a document character set because they lack the necessary coded character
set mapping function.

Yet people here have been talking about using character sets based on
ISO 2022 as document character sets.

Now, it is perfectly possible to define a legitimate mapping function for ISO
2022. Since each coded character set accessible to ISO 2022 has a designated
unique integer registration number, all you have to do is define a new integer
for each accessible character that is a unique product of both the registration
number and the code position of the character in question. (I believe that all
of the character sets accessible to ISO 2022 are also coded character sets.)
This gives you a new mapping function that satisfies the requirements for a
coded character set given in "Character Set Considered Harmful", as far as I
can tell.

There are problems with this, of course. One is that the mapping is not 1:1 --
since there's lots of overlap between coded character sets there are now
lots and lots of numeric references that will produce the same character.

Another, somewhat lesser, problem is that the resulting ISO 2022 coded
character set is not closed. New coded character sets are created and registered
under ISO 2022 from time to time, and these would be added implicitly. This can
be solved by the same method we use to turn ISO 2022 into a usable MIME
character set: Profiling. Limit yourself to some fixed subset of all of
the registered character sets and this problem goes away.

My question, then, is whether or not the work has been done to turn ISO 2022
into a coded character set. If it has been done, where was it done. If it
hasn't been done, is someone proposing to do it or is the idea that character
sets based on ISO 2022 can be used as document character sets simply wrong?

Ned