Re: FYI: Multilingual Encoding Discussion on www-talk@www0.cern.ch

Steven D. Majewski (fxrojas@nlsarch.austin.ibm.com)
Thu, 29 Sep 1994 06:28:05 +0100

I received the following through the Unicode consortium mailing list.
I hate to jump into a discussion but here are some comments based on
my experience on Motif and X i18N development ...

Since I am not signed onto this mailing list, you'll need to respond
directly...

Frank Rojas
AIX NLS Architecture VNET: AUSTIN(FXROJAS)
Advanced Workstation and System Division Tie-line 678-8183
IBM, Mail 9652 Phone: (512) 838-8183
Austin, TX 78758 FAX: (512) 838-3886
AWD Net: fxrojas@nlsarch.austin.ibm.com

------------------

----- Begin Included Message -----

Sender: www-talk@www0.cern.ch
From: Jeff Smith <sumisu@slab.ntt.jp>
To: www-talk@www0.cern.ch
Subject: Re: ISO charsets; Unicode
To: Multiple recipients of list <www-talk@www0.cern.ch>

If you haven't noticed, Motif doesn't allow the mixing of character
sets in a single text widget - it takes more than a hack of the client
to display multiple character sets (e.g. Hebrew, Greek, Japanese) on
the same "page."

The only way to do this - I haven't tried - would be to use Mule.

js

This is not entirely true. It depends on the localization of the
particular system. AIX is now providing a UTF-8 locale and I believe
Plan 9 is doing something also with UTF-8.

Actually X11 release 6 did include some support for UTF-8 but it was never
completed. Finally, UTF-8 has been promoted by the X/Open and Uniforum
Joing Internatioanlization Working Group as the most portable means to
support UCS on traditional XPG/POSIX systems....

And to prove that it is real...

On the recent AIX 4.1, we provide a UNIVERSAL locale that is based on
UTF-8 (wchar_t = UCS-2).... We are currently able to input and display
using the standard Motif 1.2 with our localization for:

ISO8859-1,2,5,6,7,8,9
Japanese/Chinese/Korean
Hebrew/Arabic

This support was demononstrated at the last Unicode Workshop this month.

All of this is using standard Motif 1.2 which is internataionalized such
that it can display in any locale. In addition we (AIX) provide over 50
national locales that use the local code set of the territory. All of this
actually works with the Common Desktop Environment.

|>In article <8899@cernvm.cern.ch> you write:
|>
|>|>Has a formal mechanism been considered for specifying various popular
|>|>coding standards, such as ISO 8859-7, ISO 8859-8, etc., and (perhaps
|>|>off in the future) Unicode?

UTF-8 seems to be the preferred vehicle... amoung XOpen members...

|>|>The motivation for this question is essentially this: Several really
|>|>exciting developments are being stymied by the Web's largely ASCII/
|>|>English-only focus. As I discussed privately with several readers of
|>|>this forum, there is, for example, a project afoot (nearly complete)
|>|>to create a full lexicon and concordance of the Dead Sea Scrolls. I
|>|>imagine a system where users can look up words, and view the original
|>|>scrolls as inlined images. The problem is that the DSS are written
|>|>in Greek, Aramaic, and Hebrew.

This is what we built the UNIVERSAL locale for...

|>This is a Mosaic problem, not a WWW problem. Mosaic can handle multiple
|>fonts but only one charset.

Using UTF-8 this should be sufficient.

|>At least one TBA browser supports mixed
|>character set documents.HTML/3.0 is better here as well.

This would require by-passing the standard Motif localization and providing
your own localization (fonts, input methods, locale, etc...) along with
the browser... I think a better approach is to use the standard
Motif 1.2 internationalization API's and depend on the localization provided
by the Motif implementation.

I know that the (COSE) CDE environment I18N is based on this and is
sufficient for their mail/edit/helps/etc...

|>|> Specially hacked clients are only just
|>|>recently arriving that can do Japanese and a few other languages.

I wonder if they are using standard Motif/X I18N functions?

|>|>No general solution exists.

I'd say that Motif/X I18N functions should meet the needs of regional
documents.

For multilingual documents, the UTF-8 should be the preferred means...

|>|>And (perhaps most importantly) there is no-
|>|>thing in the HTML(+) descriptions that allows one to specify when text
|>|>in one language ends and text in another begins, or to specify what
|>|>encoding system is being used for either.

For display and input purposes this is not absolutely neccessary. We've
build a "universal input method" that allows user to switch from one
language to another and allows user to select characters using planes
of UCS...

|>|>The DSS project isn't the only one that appears stymied. There is a
|>|>Cushitic etymological database (say that with a mouth full) at the U
|>|>of Chicago that's machine readable, and comes replete with a standard
|>|>interface. The project head would be happy to plug it into the Web,
|>|>but again the Web only knows ASCII.
|>
|>Here I suspect you need something quite a bit more sophisticated and
|>which is at least 6 months off.

Just get a browser that uses the standard Motif 1.2 and X 11 release 5
interfaces and that should be enough for the time being...

this will meet the requirements for documents shared in a regional
environment and ...

then put the requirement for UTF-8 localization from the Motif suppliers
to do multilingual documents ... actually, we should request the
X Consortium to finalize the UTF-8 localization with X11.6 ...

|> You need a highly modular browser and drop in your
|>own module into it. That type of research tends to need highly specialised
|>fonts and a lot more flexibility that first sight might imply.

True... such localization does not come easy not quickly...