Re: Charsets: Problem statement/requirements?

Gavin Nicol (gtn@ebt.com)
Fri, 10 Feb 95 10:53:53 EST

I must say thank you to Albert for both supplying references when I
needed them, and for now understanding what I was trying to say!

>It seems possible that we could define the DTD in terms of Unicode
>and map/project this onto other character sets as required *even
>if Unicode is not used as the transport charset*. This may be what
>Gavin is suggesting with ERCS; I'm not sure, but I think it is
>worth thinking about. This could accomidate use of many characters
>codes for transport, though codes that did not contain all the
>markup characters might have to be converted to Unicode.

This is exactly what I am proposing. Is my English really so difficult
to understand? I *have* been in Japan a long time... perhaps I have
lost the flair for the language I once used so well...

>* I think we should define some form of SGML markup to be used
>to indicate changes in language. A low-level mechanism for hinting
>at language changes (like the use of Unicode private codes) might
>use less bandwidth but be harder to implement across character
>sets. I think it is useful to be able to markup language changes
>even in ISO-8859-1 text.

Fine, though I dont't know many encodings in which one can change
language mid-stream without also changing the character set, which,
unless we are translating to a superset, *before* the parser, we
cannot accomplish (well) in SGML.

The low-level tag idea of mine is absolutely Unicode-specific, and
should henceforth be thought of primarily as an encoding of Unicode,
not as a raw Unicode stream (though it may be interpreted as such).

TEI should be required reading for all SGML adepts.