Re: charset parameter (long)

Gavin Nicol (gtn@ebt.com)
Mon, 16 Jan 95 17:21:38 EST

Dave Ragget writes:

>I am not quite sure I agree with this, as I want to include in the byte
>stream, information about direction, changes to character set, language
>and so on.

That would be ideal, but somewhat hard to do. What do those extra
bytes mean to the parser?

>SGML assumes fixed width characters according to the
>definitions in the <!SGML> declaration. It doesn't support multiple
>character sets as such. James Clark has a proposal for how to use the
>entity manager to handle these though. A simple approach is to use
>Unicode within the sgml parser, mapping to it from other character
>sets.

The latter is essentially what ERCS proposes. James Clark's model is
extant within his sp parser, but it seems to require quite complicated
declarations.

>Another idea, I prefer, is to define the internal character set dynamically
>according to the needs of the external character stream, i.e. the internal
>character set grows to incorporate all the characters needed for that
>docuument. This approach hides the display direction and other parameters
>from the sgml parser, leaving it up to the formatting code to make use of.

One could implement this idea, but I doubt it would be easy, or
efficient. It is not simply a matter of mapping characters to this
dynamic internal table: you must also change the contents of the
tables used for character class mapping. In many cases, you will need
a large dynamic internal table (16 bits) to handle all the languages.
In addition, in the display subsystem, you will either need 16 bits,
or multiple mapping tables. Which is simpler?

>The bottom line is: lets leave info about directionality and multiple
>character sets out of the SGML markup, and instead put it where it
>belongs, in discussions about the character transfer stream.

I tend to agree with this. One reason I like the idea for using 2
codes from the Private Use Area is that we can get both the high
level, and low level, for little extra cost.