Re: Revised language on: ISO/IEC 10646 as Document Character Set

Glenn Adams (glenn@stonehand.com)
Wed, 10 May 95 09:43:49 EDT

I was asked a good question about what happened to the full UCS-2
code space of 10646/Unicode. Since the answer may be of general
interest, I'm sending it to the HTML-WG list.

As you know, characters are assigned to code positions. The
remaining positions are of two kinds: unassigned meaning they
are available for future assignment, and reserved meaning they
will not become available for assignment. Of the remaining
positions, 6400 are reserved for private use, 65 for C0, C1, and
DEL, 2 for special purposes (FFFE and FFFF). In addition, 2048
are reserved for UTF-16 use. That comes to:

34,168 assigned
22,853 unassigned
8,515 reserved

Glenn

[P.S. for the curious, FFFE is reserved as an indication of a byte
swapped form of 0xFEFF, this latter known as the byte order mark; FFFF
is reserved as an application specific marker (e.g., it is useful to
put in a lookup table to indicate the absence of a mapping).]