Re: Charsets: Problem statement/requirements?

Luke ~{B7?M~} (ylu@ccwf.cc.utexas.edu)
Tue, 14 Feb 95 16:00:17 EST

On Mon, 13 Feb 1995, Gavin Nicol wrote:

>>like Chinese. Peope create new chinese characters and depreciate old
>>characters all the time, according to certain rules, i.e. you can say,
>>every single Chinese character might consists of several sub-characters
>>(pian1pang2 and bu4shou3). Some contribute to the form of a character,
>>some to the meaning and some to the sound of the entire character depending
>>on the _spatial positions_ and combination of these sub-characters. In a
>>sense, alphabetic language is one dimensional, while chinese is 2-D. A
>>single chinese _character_ can be a _word_ which has meanings. To
>
>I live in Japan. I can read Japanese. I know the issues.

The charset of Kanji in Japanese is much more static than charset in
Chinese, thanks to the Kana system in Japanese, as most new japanese words
are created as Kana (hiragana and katakana). Most current encoding schemes
including (Big5, GB, JIS, EUC, Unicode etc.) are dumb mapping schemes, and
extremely inefficient in terms of usage/charset ratio. A single outline
font (True Typ or Postscript) can occupy 15 MB disk space, and be very slow
to display/handle. Such big charsets are just so so for ordinary daily
communications. On the other hand, I often encounter situations where I
have to find another way (sub-optimal) to express my ideas just because I
can't find/create a particular character. As I said, there are researches
underway to investigate new encoding schemes that will dramatically reduce
the necessary size of charset and facilitate creation/rendering of new
characters.

>Unicode supports most (all?) of the big5 character set. For newly
>created characters, send a GIF, or a bitmap.

Isn't that a kludge or what? What if users want to use different
fonts/size/styles? The purpose/ideal of Unicode is good, but it is still
technically inmature/impractical just like communism. Anyway, the new
framework should provide hooks for multiple charsets and further
development of encoding schemes, no matter what charset/encoding scheme is
eventually used as default.

__Luke

--
Luke Y. Lu
mailto:ylu@mail.utexas.edu/
http://www.utexas.edu/~lyl/