Dr. Irvine,

Languages and character sets are a topic of discussion in the HTML
Working group of the IETF. We won't address this issue in the upcoming
2.0 document, but there is a lot of interest in these topics for the
next revision of the standard.

I have rasied the topic on the comp.text.sgml newsgroup to see if
that community has any valuable experience. Not much to report
on that front, at this point.

I should collect my thoughts on this topic, along with references
to all the available materials (papers, projects, etc.). Or
perhaps someone else would create a summary... perhaps a hypernews
page, Mr. LaLiberte?

Bye for now...

Dear Dan Connolly,

___Avoiding linguistic discimination in HTML+___

I have recently gained access to WWW and, after reading http://www.
ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html, want to say
that I find HTML (HTML+) fascinating. I believe it to be the future
of the net, destined hopefully to become used by people of all types
and all languages around our planet. It is therefore ABSOLUTELY VITAL
that HTML (HTML+) does not create linguistic discrimination.

I myself, and others like me, want to read and write HTML (HTML+)
pages in languages that use Latin-2, Latin-3, and higher. My
particular interest is in the Latin-3 language Esperanto. This is
one of the world's top 150 languages and more importantly it is
supported by Unesco resolutions -- it is therefore reasonable to
expect HTML (HTML+) (and associated readers) to support Latin-3.

Maybe I'm stupid, but I can't find mention of Latin-3 in the
descriptions of HTML. Please could you tell me if ISO-8859-3 already
exists in HTML. If not, I have the following serious suggestion
for HTML+, to which I hope you'll give due consideration.

Here's an example to illustrate how HTML+ should handle Esperanto:
Consider an Esperanto speaker who wants to say "at",
in 7-bit e-mail s/he would probably write "cxe",
in HTML+ s/he should use "&cx;e"
(or something like "&#3.230;e", but note that "&ccirc;e" would
be inappropriate). A reader application should display "&cx;e"
as circumflexed-c followed by e. If for some reason the reader
can't find even a single Latin-3 font, or the user requests a
font override forcing ASCIIization of all languages (which can
sometimes be useful), then the reader application should display
c followed by x followed by e.

Similarly &Cx; &gx; &Gx; &hx; &Hx; &jx; &Jx; &sx; &Sx; &ux; &Ux;.
(There are other languages in Latin-3, and in Latin-2 & Latin-4,
and these of course must also be supported by HTML+). I look
forward to the day when I can see true Esperanto characters
displayed fluently by HTML+.

Yours faithfully,

Dr Aaron Irvine,
a member of the Universal Esperanto Association,
which is a body in consultative relations with the UN and Unesco,


Esperanto uses circumflexed c C g G h H j J s S, and breved u U.
Some words to illustrate the use of the x convention:
cxe = at
sxi = she
Gxi = It
aux = or (a followed by breved u)
Hxoro = Choir

