Re: Character Set Terminology, SC2 vs. SC18 vs. Internet Standards

Glenn Adams (glenn@stonehand.com)
Sun, 9 Apr 95 17:13:46 EDT

After re-reading my last message, I find that I need to make a couple
of corrections, particularly concerning emphasis.

----------------------------------------------------------------------

I said:

> The term "collection" is different from "set".

The issue here is how to define "character repertoire". The argument
for using the term "collection of distinct ..." rather than "set" is
not very convincing (having just written it). However, given the fact
that the context of this discussion revolves around problematizing
the term "character set", it may be useful to avoid the term 'set' in
certain places, even where it might be effective from a mathematical
sense. The argument I presented for not using "set" on the basis that
we aren't specifying a membership criteria is perhaps a bit more strong,
but not overly so. In a sense, we are establishing a membership criteria
by stating a rule for determining distinctness, i.e., having distinct
names. Therefore,

[Dan]
character repertoire : a set of characters; that is, the range of a
coded character set.

is essentially equivalent to my definition

[Glenn]
(b) character repertoire : a collection of distinct characters

NOTE - two characters are distinct if and only if they have distinct
names in the context of an identified character repertoire.

if by "set" is meant "collection of distinct ...".

My purpose in using the longer description is to focus on "distinctness"
and to provide an occasion for the note.

----------------------------------------------------------------------

[Glenn]
(f) coded character set : a one-to-one mapping from a character
repertoire to a code set.

By one-to-one here I really mean an injective (into) mapping, and not
a bijective (into and onto) mapping, since there are members of the
codomain (i.e., the code set) which are not in the range of the mapping.

----------------------------------------------------------------------

Under the discussion of the term character encoding (scheme), I said:

> the presumed domain of this function is expressed as the uncountable
> set consisting of all "sequences of octets".

I should have said unbounded instead of uncountable. However, this really
has no bearing on the discussion. The point of the comment was to indicate
that there is in fact an internal structure to the sequences of bit
combinations (one instance being octets) which is defined by the
transformation algorithm expressed by the encoding scheme.

While it is interesting to note that this could be expressed as a function
from the set of strings of octets (a countable, but unbounded set, i.e., A*
where A is the finite alphabet {0,..,255}) to the set of strings of
characters [of some repertoire or collection of repertoires] (a countable,
but unbounded set), such an expression doesn't note the important practical
fact that such a scheme centers around an algorithm which defines the
transformation expressed by such an abstract function.

The focus given in the definition of character encoding scheme should
be on the algorithmic transformation and upon characterizing the results
upon which there still needs to be some discussion; i.e., whether to
express the results a a sequence of characters or as a sequence of
primary coded representations (i.e., code set positions).

Regards,
Glenn