Re: The future of meta-indices/libraries?

Martijn Koster <m.koster@nexor.co.uk>

Mail folder: WWW Talk Jan 94-present
Next message: burchard@geom.umn.edu: "Re: The future of meta-indices/libraries? "
Previous message: Martijn Koster: "Re: The future of meta-indices/libraries? "
Maybe in reply to: Martijn Koster: "Re: The future of meta-indices/libraries? "
Reply: burchard@geom.umn.edu: "Re: The future of meta-indices/libraries? "

Errors-To: listmaster@www0.cern.ch
Date: Tue, 15 Mar 1994 21:34:06 --100
Message-id: <9403152025.AA07313@dxmint.cern.ch>
Errors-To: listmaster@www0.cern.ch
Reply-To: m.koster@nexor.co.uk
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Martijn Koster <m.koster@nexor.co.uk>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: The future of meta-indices/libraries? 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 2717


> I think the WWW community should have addressed this long ago.  This
> is the main area in which we are well behind the gopher community.

I think this is one of the examples of the lack of a Working Group.
It is really easy to discuss problems and come up with solutions,
but even if solutions are proven to work there is no mechanism
for standardising it. As a result all the same problems keep arising,
and people keep coming up with the same solutions.

In this case the problem has been addressed by ALIWEB. Have a look at
http://web.nexor.co.uk/aliweb/doc/aliweb.html
 
> In my opinion, one of the most important design criteria should be to
> eliminate the need for indexers (of whom there will likely be many) to
> walk the entire server tree.  This can be annoying and it the worst
> cases disruptive.

I couldn't agree more. This is why I don't welcome the Robot trend,
and hope to help keep an eye of them by gathering information on the
Robot page (http://web.nexor.co.uk/mak/doc/robots/robots.html)

> A second important criterion would be giving the maintainer control
> over what is indexed.

> I would argue for a very simple document ....

ALIWEB does that.

> As a server writer I would implement this by having my server create
> this document on the fly when it is first requested and then cache
> it for later use until it expires.  Subsequent requests would get
> the cached version until its expiration after which a new version 
> would be created and cached.  The maintainer would set the expiration
> period and could mark any part (or all) of his tree as not to be 
> indexed.  The cached file would be extremely useful for features local
> to the server also.  For example, a search of all titles on the server
> or WAIS searches which return a menu of *titles* of hits (this is done
> now by WWWWais, for example, but it must search each document corresponding
> to a hit to extract its title)

I am not sure what you mean here. I'm not sure it is going to be sensible
to index all titles on a server and search those, even though it sounds
attractive. You do need to retain the context of the titles.

You mention marking part of a tree not to be indexed. Although it is
not quite what you mean, you may find it interesting to learn about a
proposal on the Robots page to introduce a voluntary mechanisms to
exclude part of trees by robots. I agree robots are the wrong solution
to the resource discovery problem, but they are going to be around, and
it makes sense to reduce problems they cause.

-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html