>It seems possible that we could define the DTD in terms of Unicode
>and map/project this onto other character sets as required *even
>if Unicode is not used as the transport charset*. This may be what
>Gavin is suggesting with ERCS; I'm not sure, but I think it is
>worth thinking about. This could accomidate use of many characters
>codes for transport, though codes that did not contain all the
>markup characters might have to be converted to Unicode.
This is exactly what I am proposing. Is my English really so difficult
to understand? I *have* been in Japan a long time... perhaps I have
lost the flair for the language I once used so well...
>* I think we should define some form of SGML markup to be used
>to indicate changes in language. A low-level mechanism for hinting
>at language changes (like the use of Unicode private codes) might
>use less bandwidth but be harder to implement across character
>sets. I think it is useful to be able to markup language changes
>even in ISO-8859-1 text.
Fine, though I dont't know many encodings in which one can change
language mid-stream without also changing the character set, which,
unless we are translating to a superset, *before* the parser, we
cannot accomplish (well) in SGML.
The low-level tag idea of mine is absolutely Unicode-specific, and
should henceforth be thought of primarily as an encoding of Unicode,
not as a raw Unicode stream (though it may be interpreted as such).
TEI should be required reading for all SGML adepts.