Revised language on: ISO/IEC 10646 -- another proposal

Albert Lunde (Albert-Lunde@nwu.edu)
Fri, 12 May 95 01:36:15 EDT

After some thought and reading I think I'm prepared to offer a more precise
version of one of my suggestions of the other day. This is intended to
revise Dan's section which read:

>A document is a conforming HTML document only if:
[...]
>Its document character set includes ISO-8859-1 and agrees with ISO10646;
>that is, each code position listed in section The ISO-8859-1 Coded
>Character Set is included, and each code position in the document character
>set is mapped to the same character as ISO10646 designates for that code
>position.

My proposal:
= = =
A document is a conforming HTML document only if:
[...]
Its document character set includes ISO-8859-1 and agrees with ISO10646 for
all characters and code positions that they have in common. That is:

1) each code position listed in section The ISO-8859-1 Coded Character Set
is included.

2) All code positions that are used in the document character set and are
also used in ISO10646 must map to the same characters as they map to in
ISO10646.

3) All characters that are in the intersection of the character repertoires
of the document character set and ISO10646 must be mapped to by at least
one code position used in ISO10646.
= = =
Optionally, as an explainatory note:
= = =
ISO10646 is used in this way to provide a consistent SGML intepretation of
numeric character references over a large range of characters and encoding
schemes. These conditions places very little constraint on the character
encoding (specified by the MIME charset parameter in HTTP, or by other
external means in other contents.)

This standard does not exclude the use of a document character set
containing characters not in ISO10646, but it does not completely specify
how to choose code positions for such characters. Use of numeric references
to such characters may therefore raise problems of interoperability outside
the scope of this document.
= = =

I haven't tried to rewrite Dan's other notes to be consistent with this
proposal.

I would like comments on if this addresses various objections.

It seems to me this preserves a couple of properties of Dan's proposal:

- It allows the use of ISO-8859-1 as a document character set.

- Numeric references that refer to code positions in ISO10646 must map to
the same characters as ISO10646.

- Any character in ISO10646 that's in the document character set can be
translated to some ISO10646 numeric reference.

On the other hand, it allows the construction of document character sets
that are supersets and extensions of ISO10646 by adding code position
beyond its range or using unused positions.

We may want to add some further condition that the document character set
only uses additional code positions which are "safe" in some sense of what
ISO10646 has designated for private use vs. future expansion. I don't know
enough about ISO10646 to word this correctly.

---
    Albert Lunde                      Albert-Lunde@nwu.edu