Re: Searchable Indexes: LISTEN NOW!
Tim Berners-Lee <timbl@www3.cern.ch>
Resent-Message-Id: <9307011711.AA29832@dxmint.cern.ch>
Resent-Date: Thu, 01 Jul 93 12:55:46 EDT
Resent-From: Richard W Wiggins <WIGGINS@msu.edu>
Resent-To: www-talk@nxoc01.cern.ch
Date: Thu, 1 Jul 93 11:38:34 +0200
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9307010938.AA05487@www3.cern.ch>
To: Nathan Torkington <Nathan.Torkington@vuw.ac.nz>
Subject: Re: Searchable Indexes: LISTEN NOW!
Cc: www-talk@nxoc01.cern.ch
Status: RO
Perhaps not surprisingly, a lot of this discussion is reminiscent of
discussions that have taken place in the Gopher community. The existence
of global indexes like Veronica seems to be one of the few remaining
advantages of Gopher over WWW; providing global WWW indexes will be a
big step forward.
Besides Veronica, folks in Gopherspace have experimented with tools that
walk servers of interest and pull up new or changed titles. Folks who
run servers don't like to see automated widgets walking their entire
directory tree daily. So if you're going to invent a catalog structure,
you might also devise a format for a catalog of recent updates. (Yes, a
searcher could scan the overall catalog to accomplish this, but nicer to
spend the CPU once.)
Of course you want to be able to tell people to not index your Web, but
you might also want to be able to stop the indexing at a certain level.
The Gopher title index tool written by my colleague Dennis Boone allows
"stops" anywhere within the hierarchy; e.g. we don't want to index
Usenet News titles. I'm not sure how this idea would map to the Web, but
it's worth considering.
Conversely, it'd be nice to be able to advertise a particular document
as a logical starting point within one's server. For instance the root
page of your online multimedia version of Moby Dick would be a better
thing to have in a global index index than all 1000 pages within that
title.
In Veronica, it's possible to search for folders only. This narrows
searches considerably. Again, not sure how well this maps to the Web,
but it is a very useful feature. It'd also be nice to have an
automatically maintained list of all known home pages.
Will the catalog include document titles along with URLs? A major
disadvantage of Gopher indexes is that the document names are yanked out
of context, so you often don't know what an item named for instance
"Jobs Posting" means -- where it came from or what the context is. With
fully-specified URLs and with the titles, a global Web index would not
have this problem.
/Rich Wiggins, Gopher Coordinator, Michigan State U
----------------------------Original message----------------------------
Summarizing, we need
1. A standard URL like http:/Catalogue.html
2. A convention for saying "no" which could be just an existing but void
file.
3. A standard format for the file.
4. A PD program distributed with the server to generate the file so that
many people will do it weekly.
...