Re: Indexing the List of Lists

"Rob Raisch, The Internet Company" <raisch@internet.com>

Mail folder: WWW Talk Jan 94-present
Next message: Kevin 'Kev' Hughes: "Getting HyperMapEdit"
Previous message: Ari Luotonen: "If-Modified-Since enhancement"
Reply: Alan Emtage: "Re: Indexing the List of Lists"

Errors-To: listmaster@www0.cern.ch
Date: Mon, 21 Mar 1994 21:25:18 --100
Message-id: <Pine.3.85.9403211153.A24484-0100000@hmmm>
Errors-To: listmaster@www0.cern.ch
Reply-To: raisch@internet.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Rob Raisch, The Internet Company" <raisch@internet.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Indexing the List of Lists
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Mime-Version: 1.0
Mime-Version: 1.0
Content-Length: 2348

Alan Emtage writes regarding Lists of Lists:
>... I don't believe that manual maintenance of this kind of data
>is feasible any longer.... the Internet is now too big for this kind of
>thing. 

Alan, in the general case I believe you are right.  But in the specific 
case of the information required to identify a "resource" on the net, I 
don't think so.  Let me 'splain...

The issues of indexing Internet content are vast, but we seem to have a 
number of pilot projects which attempt to address the issues.  But is 
this all the user REALLY needs?  I suggest not.

In my experience, when I am looking for information on agriculture, I am
not looking for 'grain.tar.Z' or for 'Name=Thoughts on Triticalae and the
Wheat Borer Beetle.' Rather, I am looking for 'things having to do with
agriculture.'

Indexing content is a very large problem and one I'll freely admit most 
likely needs to be completely automated.  But, identifying collections of 
value -- what I refer to as a 'resource' -- is something which can only 
be managed and maintained by the agency in authority over that resource.

On The Electronic Newsstand, Out Magazine represents a 'resource' -- a 
collection of value on a given topic, but the articles in the Out archive 
are not easily identifable by their names and pose large problems of 
cataloging.  The Electronic Newsstand is a resource, as well, as it 
represents a collection of magazines and their content, but the FAQ about 
the Enews is not a resource.

While individual files and documents number (perhaps) in the millions, 
resources of this kind are still in the very low thousands.  Now is the 
time to put infrastructure in place to handle the load.

Of course, no project works unless there is a reason for the resource 
administrator to provide the meta-information necessary.  I believe that 
there is sufficient motivation to do so, since an effort of this nature 
answers the very question we all attempt to answer by putting our 
information up for view:  How can I get people to use what I provide?

I strongly suggest that a first step in any effort of this kind must be 
the definition of exactly what we are trying to collect information on 
because the issues of indexing vs. the generation of a 'table of 
contents' are very different indeed.  

--  </rr>  Rob Raisch, The Internet Company