language hints in code stream vs. SGML markup

Larry Masinter (masinter@parc.xerox.com)
Fri, 20 Jan 95 13:30:30 EST

Question: should language be coded at the character level, using
language hint bytes, or should language be coded at the SGML level,
using language tags.

1) "But I need it when..."

* code at character level: need language disambiguation for things
other than text/html; need it for text/plain, etc.

* code at the SGML level: need language disambiguation for hyphenation
even when charset=ISO-8859-1.

2) "Compatibility with Standards..."

* code at character level: we could use Private Use Codes in Unicode
to encode hints.

* code at SGML level: we could use <language> tags to encode language.

3) "first principle"

Language doesn't shift on a character by character basis. It does
shift on a section-by-section basis if it shifts at all. It doesn't
make sense to support <german>multi</><french>ling</><english>ual</>
words on a character-by-character basis.