Re: Why <LANG>, directional considerations

David Baron (DAVIDB@accent.co.il)
Tue, 1 Aug 95 10:49:47 EDT

Why <LANG>?

In a Unicode incoding, we might get along without <LANG> in
many cases since:

1. Either a (full or multi-"page") Unicode font would be
used for display

or

2. Appropriate codepages and fonts can chosen using the Unicode "page"
(or sub-"page").

<LANG> and the character level becomes useful when there is
an (alledged) ambiguity in glyphs. It also becomes useful
on import to other programs and in authoring systems where
language and country information can be used by spell
checkers etc. Its use for localization of dates and
measures requires entities for them.

When using "raw" encodings such as those for MOSAIC 2.4
multilingual variant, a <LANG> tag (they use escapes, as
does MULE, extending the ideas is 2022 Japanese) becomes
critical for chosing among supported (ISO, etc.) codepages.

I would dispense with </LANG>, though this could be a
return to the document main languages as set in
<BODY LANG=..> or other higher level tags. <LANG> remains
in effect until another is encountered. Simple.

BIDI stuff:

Any text including Hebrew or Arabic would have a
leading (or reading) order for the paragraph. This would be

1. Defaulted from 1st "hard" character encountered as in
Unicode.

2. Set by a Unicode LTR or RTL mark

3. Set by a new RORDER=RTL attribute within <P>, <LI>, etc.

Alternatives 2 and 3 would not often be needed and apply
only to mixed text. A paragraph of purely Arabic will not
need any flag.

The formatting of numerics with Hebrew or Arabic should use
the (standard) Unicode algorithm. Hard LTR and RTL overrides are
available in Unicode (and might be relevant with <PRE>
text).

I will read Glenn's proposals on these issues!

David