Information integration at client or server?

Nick Arnett (nicka@mccmedia.com)
Fri, 22 Jul 1994 00:02:56 +0200

I alluded to this issue earlier, but it's nagging me, so I thought I'd make
it explicit. As I suggested in the earlier message, I see the Web in terms
of packaging information for navigation.

"Packaging" is enabled by the wide availability and variety of browsers
(Mosaic, etc.) that deliver a consistent user interface. It makes sense
that this takes place at the client, to accomodate lots of users on a
variety of platforms with a variety of tasks. The standards for URLs and
HTML make packaging possible. Thus the retrieval and data formatting
standards drive the browsers.

"Navigation" becomes possible via integration of information sources.
Links among related information sources can be created. This could take
place at the client, but since there are inevitably fewer authors than
readers, it makes sense to do this integration at the server. Thus it's
the servers make the "web-like" connections among information sources and
the servers are the Web, and HTTP, as the communications standard for the
servers, ties it all together.

(I realize that there are those who believe the net and hypertext will make
all people into authors and publishers, but I don't buy into that ideology.
I do believe that large numbers of people will contribute to
consensus-building, but that's not the same as authorship.)

This scenario suggests that gopher, ftp and wais slowly fade away as far as
browsers are concerned, with the information that they provide becoming
integrated via CGI applications at the servers. I think this would be a
good thing for users, the vast majority of which probably don't want to
have to make a distinction among them.

An alternative scenario, which really would create multiple Webs, is that
the browser developers increasingly "bypass" HTTP for the communications
efficiency of going directly to the source, rather than the double layer of
HTTP server and CGI application. Although the users would still enjoy the
benefit of a consistent user interface, the opportunity to integrate
information from heterogeous sources would be lost.

In other words, I'm suggesting that if there's a threat to the integrity of
the Web on the browser side, we'll see it manifested by the arrival of new
schemes. I was almost guilty of this myself. While building a prototype
"Web" front end to CompuServe, I came up with a scheme called "cis" for
URLs that would know how to retrieve messages, etc. from CompuServe forums.
In retrospect, even though that's a quick way to get information on-line,
it's a step toward building a part of the Web that doesn't link easily to
other parts, just as the gopher, ftp and wais schemes don't really
integrate.

Okay, all this begs the question of what I mean by "integration" of
information sources. I think it means a few things:

<ul><li>The ability to do simultaneous searches of heterogeneous sources.
With wais, that's possible today only if they're all wais sources. Via a
CGI application, I can include lots of other kinds of sources.

<li>Hypertext linking of key words to documents. For example, I link the
names of companies in press releases to backgrounders that contain
management and financial data. They are from two different sources.

<li>Cross-indexing of hetergenous sources -- I can generate an index to
participants in a group of related mailing lists, for example. (Among
other things, this helps me find interesting things by seeing other places
where you all post messages!)

<li>Cross-cataloging of sources -- Tools that do relevancy ranking, such as
Topic, can sort information into a subject hierarchy, offering new views of
the information. For example, I could take all of the messages in the
various Web-related newsgroups and arrange them by topic, in addition to
their "native" structure (and get rid of the redundancies and even some of
the thread drift at the same time).</ul>

I'm sure there are more examples.

In summary, I think there's an important direction decision that could be
made. Should browser developers be actively discouraged from supporting
additional schemes, in order to force data to flow through HTTP servers?
Should they just not be encouraged to do so? Is information integration at
the client a good thing, after all?

It seems to me that there are some significant implications for HTML as
well, under the assumption if it fails to provide the means to describe the
structure of information coming from heterogeneous sources, that's a
virtual guarantee that the browser developers will bypass HTTP as the
delivery protocol, I'd imagine.

Nick

Multimedia Computing Corp.
Campbell, California
----------------------------------------------------------
"We are surrounded by insurmountable opportunity." -- Pogo