> I don't find any basis in fact for this.
I can't quote you the reference to the SGML Handbook, but have this
on good faith from James Clark.
>>Another idea, I prefer, is to define the internal character set dynamically
>>according to the needs of the external character stream, i.e. the internal
>>character set grows to incorporate all the characters needed for that
>>docuument. This approach hides the display direction and other parameters
>>from the sgml parser, leaving it up to the formatting code to make use of.
> This is pretty much the same as Jame's Clark's technique: the so
> called "internal character set" is just the document character set --
> the set of characters presented to the parser. The character set
> declaration in the <!SGML> declaration tells the parser which
> characters might signal markup, and which ones are just data
> characters. It's reasonable to declare characters that mean
> nothing to the parser, but mean "change directions" or whatever
> to the application. The syntax of such delcarations is not something
> I'm intimately familiar with, but I'm confident that such things
> are expressible, theoretically.
Another idea is to use the internal character code as an index into
data about the character, e.g. its directionality, character set
and code in that set, its intended language etc. To SGML the character
is just another code, to the display routines its an index into this
extra info.
> But there are practical considerations: how does an author put one of
> these "direction change" characters into a document? I suppose the
> issues are already addressed in existing multilingual composition
> interfaces, and we just need to find a reasonable representation of
> the idioms.
thats what I was thinking ...
-- Dave Raggett <dsr@w3.org> tel: +44 272 228046 fax: +44 272 228003
Hewlett Packard Laboratories, Filton Road, Bristol BS12 6QZ, United Kingdom