Re: Resource discovery, replication (WWW Announcements archives?)

Tony Sanders <sanders@BSDI.COM>
Date: Wed, 4 May 1994 17:35:30 +0200
Message-id: <199405041530.KAA15922@austin.BSDI.COM>
Reply-To: sanders@BSDI.COM
Precedence: bulk
From: Tony Sanders <sanders@BSDI.COM>
To: Multiple recipients of list <>
Subject: Re: Resource discovery, replication (WWW Announcements archives?) 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Organization: Berkeley Software Design, Inc.
Organization: Berkeley Software Design, Inc.
"Daniel W. Connolly" writes:
> Each of these is an all-or-nothing proposition: in the first
> case, I have to locate an ALIWEB server with all the data in the
> world on it (scalability test says: BZZZZT). Or I can copy
> all the data to my machine (BZZZZT). Or I can get "the list of
> hosts" (BZZZT) and do it myself.
There is nothing about ALIWEB that says you *MUST* gather the information
via robot retrieval.  I assure you that when the database get's too big
to man handle that other distribution schemes like your USENET
suggestion will get implemented.  An indexing scheme that can't support
multiple data sources isn't going to fly.

The real problem is that implementing the USENET scheme will take a fair
amount of work and effort while writting a simple 10 line shell/perl script
to go grab the data on command is very easy.  Who is going to implement
this?  If you do it then I'm sure it will get used.

Also, with your USENET scheme WWW server admins must *know* to post this
message every so often.  That in itself is an amazing barrier.  A solution
might be for the server itself to send email to remind the admin :-)
Maybe even offer to do it automatically.

ALIWEB itself suffers this same problem, admins must generate the index
entry for their site and until there is a tool that will do that
semi-automatically and is distributed with every server and *forces* the
admin to do it, then it isn't going to be as wide spread as it needs to be.