Re: The future of meta-indices/libraries?

browne@cs.utk.edu

Mail folder: WWW Talk Jan 94-present
Next message: John Franks: "Re: The future of meta-indices/libraries?"
Previous message: Thorsten Ludewig: "Re: WWW servers for Novell"
Maybe in reply to: Peter Lister, Cranfield Computer Centre: "Re: The future of meta-indices/libraries?"
Reply: John Franks: "Re: The future of meta-indices/libraries?"

Errors-To: listmaster@www0.cern.ch
Date: Wed, 16 Mar 1994 16:21:31 --100
Message-id: <199403161515.KAA21095@pebbles.cs.utk.edu>
Errors-To: listmaster@www0.cern.ch
Reply-To: browne@cs.utk.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: browne@cs.utk.edu
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re:  The future of meta-indices/libraries?
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 5942

I have noticed that a skilled reference librarian doing an
on-line search does not simply type a few keywords into a
global metaindex and expect to get useful results.  And
why not?  On-line databases are subject area specific, for
one thing.  The searcher must first choose a database, either
from prior knowledge or by consulting a directory of databases.
Next, if she is not already familiar with the database,
she studies its documentation to become familiar with its
classification schemes/codes and how to use them.  Once she
has gotten results from an initial query, she widens or
narrows the search according to whether too little or too
much information was returned.  She also uses initial results
as a guide to formulating further queries.  For example, if
a particularly relevant reference is spotted, a useful
technique is to plug descriptor codes returned with the citation
for that reference into a new search.  These techniques
depend on being able to qualify the search in specific ways.
For example, to restrict the search, one might specify that
query keywords be applied only to the keyword and descriptor
fields, rather than to entire abstracts.

So just as no single on-line database can exhaustively
index the entire world of printed publications (those that
attempt to do so succeed only superficially), neither can
a single Internet database index all information available
electronically.  Any database that attempts to do so will
be unwieldy and hopelessly out-of-date.  So the solution
seems to be to divide up the world.  But how to divide it
up?  Some have suggested dividing it up geographically.  I
think this is a bad idea, since I seldom want to restrict
sources of information geographically when doing a search.
Some have suggested indexing separately by type of service
provide -- e.g., one index for anonftp, another for gopher,
another for WWW -- in fact, this is what is already being
done.  But again, I am usually not interested in restricting
my search in this manner.  Instead, I want to retrieve all
relevant material, regardless of its access protocol and
format, although perhaps I have a preferred format, if more than
one is available.  It makes much more sense to me to divide
up the world by subject area, and to have active participants
in different subject areas (e.g., high-energy physics, HPCC,
environmental sciences, education) 
have ownership of subject area databases.

To do effective searches, we will need to be able to search
on particular attributes and to use subject area specific
keywords and descriptors.  The IAFA templates may well be a
workable format for specifying metainformation in the form
of attribute/value pairs, if they can be standardized.
It seems reasonable to use standardized template definitions
across different subject areas, if the templates are
developed with the needs of different groups in mind, although
specialized templates may be required for special cases.  Choices of
classification schemes and descriptor keywords, however,
will need to be made by experts in the individual subject areas.
A single classification would be too unwieldy and difficult
to use, and the same keyword may have different meanings
in different contexts.  An example of a discipline-specific
classification scheme that is currently in use for searching
repositories of mathematical software is the GAMS classification
scheme developed at NIST (URL http://gams.nist.gov/).

Thus, subject area consortiums and on-line communities should have the
responsibility of developing classification schemes, subject
area thesauri, and guidelines for using them.  
Each group should also have the responsibility for constructing
and maintaining the searchable indices for its area.
The global Internet community should be responsible for
standardizing the metainformation format (in cooperation with
the various subject areas), for providing public-domain software to
do the indexing and run the search engines, for providing
user interfaces and client software that provide assistance
to the non-expert user and present as uniform a search interface
as possible, and for providing a global directory to area-specific
databases.  I suggest that groups or individuals, such as Peter
Deutsch at Bunyip or Rob Raisch at Internet Company, who are interested
in developing and providing the generic tools, try to develop
a working relationship with one of the existing or forming
on-line research communities, such as HEPNet or the HPCC
community, and use them as a prototype.

Lastly, consider the approach that nowadays seems popular of
trying to automatically index all network resources with a minimum
of human intervention.  The sad fact of the matter is, the quality
of searching that is possible depends critically on the accuracy,
specificity, and thoroughness of the metainformation provided,
and providing this metainformation necessarily involves human
effort.  Perhaps retrofitting all existing resources with full
metainformation is too daunting a task.  As soon as metainformation
standards can be agreed upon, authors and publishers of new
material must be expected to provide the appropriate metainformation
and to update it as needed.  Otherwise, the "searching for
Internet resources" problem simply will not be solved.  I agree that
keeping metainformation accurate and up-to-date is currently
a problem, but perhaps this problem can be alleviated by
providing appropriate repository management software.  Perhaps
the URN/URC/URT scheme, together with Chris Weider's transponder/agent
idea would be useful here.

************************************************************************
Shirley Browne         Research Associate      107 Ayres Hall
browne@cs.utk.edu      Computer Science Dept.  University of Tennessee
(615) 974-5886         Fax (615) 974-8296      Knoxville, TN 37996-1301
*************************************************************************