Re: Charsets: Problem statement/requirements?

Luke ~{B7?M~} (ylu@ccwf.cc.utexas.edu)
Wed, 15 Feb 95 02:04:48 EST

>>technically inmature/impractical just like communism. Anyway, the new
>>framework should provide hooks for multiple charsets and further
>>development of encoding schemes, no matter what charset/encoding scheme is
>>eventually used as default.
>
>Sure. I agree 100% *except* we cannot have multiple character sets
>within a document when it reaches the parser.

Well, for _internal_ encoding, you can use whatever encoding you see fit, a
slightly extended Unicode is OK, the extension can provide some hooks for
rendering new characters (reserve some encoding space for char creation).
I just don't like standardizing on Unicode as a transport encoding which I
believe you've been advocating, because someday there might be
changes/advances in encoding schemes for some languages, and suddenly
documents using these new schemes become illegal html files. The right
approach, imho, is to use an extensible _transport_ encoding scheme, a la
ISO-2022, and use encoding filter(s) (which can be easily upgaded) to
transform the document to an internal encoding before it reaches a SGML
parser. The filter(s) will ensure that the parser never see an illegal
char. A new character with whatever encoding info maybe mapped by the
_filter_ to a &newcnchar1, with a pointer to its rendition info. --
problem solved?

>If you can supply pointers to the "research on new encodings", I'd
>be most appreciative.

I am not an expert on this, but I'll ask around for you.

__Luke

--
Luke Y. Lu
mailto:ylu@mail.utexas.edu/
http://www.utexas.edu/~lyl/