ISO, Unicode; right-left languages

Richard L. Goerwitz (goer@midway.uchicago.edu)
Tue, 27 Sep 1994 15:55:30 +0100

Several people have written, mostly my compatriots, suggesting that there
isn't much demand for multilingual text - still less bidirectional text.
I'd like to take this opportunity to show that the issue is in fact impor-
tant enough to have spawned several RFCs and a series of ECMA and ISO/IEC
technical reports and standards. You can look at servers that are biling-
ual right now, although they won't look right unless you have a specially
hacked server (e.g. try http://www.huji.ac.il/WWW_DIR/default.html). Read
on, brave souls:

First, I quote a previous response to my posting on bidirectional word-
wrap. The poster is commenting on the option of using "visual" or
reverse encoding for right-left language:

>[note on reversing the order of letters deleted]
>
>The suggestion does not work when the text has multiple lines. For this the
>rules are quite complex but tractable. there is an RFC with an algorithm
>for doing the wrapping. I think it is better to assume that particular
>encodings will read in their natural direction and then let the computer
>do the reordering.

This doesn't quite make sense to me. Files have no order, other than
beginning-to-end. The notion of right-left and left-right is purely
visual. So ideally you'd always want to code all languages the same:
The characters that come first in the text come first in the file. You
say this later on on your posting, so I realize you're fully aware of
the issues.

The reason we can't just do things this way is that bilingual diction-
aries, servers with bi- or multi-lingual menus (e.g. Japanese-English,
Arabic-English), etc., must cope with translating a mondirectional file
structure into a bidirectional display. To do this you have to either
change directions manually in the file (example one in my last posting)
or you must use the same internal directionality, and rely on either the
server to preformat the display, or let the client do the reversal it-
self (example two in my last posting).

It's really not all that difficult to do once you're used to how the
system works. There are well-established typographical conventions on
how to wrap mixed left-right and right-left text, and there are some
de facto standards on how to mix up-down with either of these. Any-
one who is seriously interested in implementing such a scheme is in-
vited to contact me for more information. I'm probably not the best
source though. There are existing docs you can consult (read on...).

Although the market in Arabic-speaking countries is potentially huge,
most of the initial work has been done for Hebrew. RFC 1555 has some
discussion of the difference between 1) coding direction shifts into the
byte stream and 2) leaving them for the server and client to arrange.
I quote from RFC 1555 (here it's talking about mail):

The default directionality of the text is visual. This means that
the Hebrew text is encoded from left to right (even though Hebrew
text is entered right to left) and is transmitted from left to right
via the standard MIME mechanisms. Other methods to control
directionality are supported and are covered in the complementary RFC
1556, "Handling of Bi-directional Texts in MIME".

Unfortunately, RFC 1556 doesn't help much:

Fortunately, ECMA (European Computer Manufacturers Association) has
tackled this problem previously and has issued a technical report
called "Handling of Bi-Directional Texts". ECMA TR/53, as it is
called, was used to update the Standard ECMA-48 which in turn was
used as the basis for ISO/IEC 6429 which was adopted under a special
"fast track procedure". It is based on this information that a new
character set is being defined which will indicate that the bi-
directional message is either encoded in implicit mode or explicit
mode. The default is visual mode which requires no special character
set other than the standard ones previously defined by ISO-8859.

I haven't consulted ECMA TR/53, and I'd like to know where it can be
had. RFC 1556 doesn't actually get down to the details, but dwells
mainly on visual encoding - which, as has been pointed out here, will
not work well if you don't know where the line breaks occur.

My feeling is that there are plenty of resources out there to come to
at least a tentative solution to the problem of multidirectional text.
What we still need are:

1) X widget sets that support multidirectional text (Motif apparently
is on track to do this - put the pressure on, now!)
2) HTML tags that tell us
a) language
b) encoding method (e.g. ISO 8859-1)
c) which order the characters will be coming in, normal or
reverse (note that this allows us to ignore whether the
script goes visually l-r, r-l, or u-d)
d) clients that recognize multilingual (possibly multidirec-
tional) text, and can
i) request necessary fonts from the server
ii) fail gracefully if those fonts are not available, or
if the client cannot cope with the display parameters
(e.g. can't do Japanese; or, can't do Arabic in the mid-
dle of an English text)
e) servers that can provide fonts in a variety of formats on
request; ultimately we'd need servers that could provide
information about which characters are separators for which
languages, although this could also be encoded as part of
the overall DTD and client model (at great expense?)

To these ends, I'd like to float a proposal that there be a new entity
that enables us to code

1) language (e.g. English, French, Dutch, Arabic, Japanese...)
2) encoding (e.g. ISO 8859-1)
3) direction (normal | reverse) - referring to the file; not the display

Again, if I (a Humanities guy) seem to be obtruding where I don't belong,
don't hesitate to put me in my place.

Richard Goerwitz
goer@midway.uchicago.edu