Re: Unicode browsers (was: Re: Comments on: "Character Set" Considered Harmful)

Bob Jung (
Thu, 27 Apr 95 19:21:15 EDT

At 10:15 PM 4/26/95, Gavin Nicol wrote:
>>>There are many small issues, but from my experience, and Amanda and
>>>others will verify this, implementing a Unicode based application is
>>>*far* easier than trying to support even a small number of coded
>>>character sets and encodings.
>>I disagree.
>>Supporting canonical Unicode will require major changes to parser and
>>layout engines. Supporting ASCII-superset encodings is relatively easy
>>and in many case more efficient. UTF8 is an ASCII-superset and would
>>fall in the easy to support bucket.
>Only because you designed your system with Latin-1 as a basic

Partly true. But there are other issues: performance, resource usage,
round-trip conversions and extensibility.

Performance: In the current state of the world there are no Unicode fonts
on the
systems that we ship our products and there are no Unicode Web data. This
requires at least 2 conversions of the data (e.g., Latin1 -> Unicode and
Unicode -> Latin1, SJIS ->Unicode and Unicode -> SJIS). With the Netscape's
current architecture it often need NO conversions.

Resource Usage: The converters needed would require large tables and probably
extra buffers. Users who do not need multi-lingual (the majority) would be
for no benefit. There's interest in putting browsers on smaller and smaller
devices -- so memory is still an issue

Round-trips: User-defined areas will not survive the round-trip conversions
to-and-from Unicode. Without the conversion, we have a chance that it will
work for the intended target (e.g., Acme Corp's SJIS data displayed on
Acme Corp's system).

Extensibility: If we commit to a UCS-2 internal represenation, it could
restrict the support some encodings. I may need to handle UCS-4 data.
Periodically I hear that Chinese will someday exceed the 2byte limitation.
Is this real? Do we care? I suppose, we could use UCS-4 for internal

>Shoehorning Unicode support onto this may be hard *in the
>short term*, but I think you'll find that as you add support for more
>and more coded character sets and encodings, you will eventually
>produce a system that does exactly this.

Depends where we are headed with HTML. Widechar (e.g. canonical Unicode
has great advantages when a lot of text manipulation is performed. Currently
browsers performs very little text manipulation and so for the few cases
where it does, it's not terribly difficult to deal with.

>>But this is implementation detail, and should be of secondary importance
>>in defining our direction. Web content requirements are what should
>>be of primary importance.

I've just tried to point out that the implementation issues are not as
as you might think.

Let's put aside these implementation issues for now. They are important,
but I think
the labelling issue is more important issue for us to grapple with now.


Bob Jung       +1 415 528-2688, fax +1 415 528-4122
Netscape Communications Corp.   501 E. Middlefield      Mtn View, CA   94041