Re: Resource discovery, replication (WWW Announcements archives?)

"Daniel W. Connolly" <connolly@hal.com>
Errors-To: listmaster@www0.cern.ch
Date: Wed, 4 May 1994 14:32:06 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <9405041228.AA24223@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Resource discovery, replication (WWW Announcements archives?) 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: text/plain; charset="us-ascii"
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0
Mime-Version: 1.0
In message <9405041212.AA11697@hal.com>, Martijn Koster writes:
>
>I didn't say it was the most efficient thing to do, just that it was
>possible.

And I started by saying that it was "too centralized." I think
my point is made...

>> With my broadcast strategy, I just set up a process that gathers new
>> articles and expires old ones. ...
>> And its scalable: everybody has access to everything without anybody
>> having to do everything.
>
>Regarding the scaleability, you still have "all the data in the world"
>in a single machine, namely in the News spool area and in our
>database, and you still have in effect "copied all data accross", with
>NNTP instead of HTTP. So your first two bzzzzt's bite your own approach
>too.

Well, the data is all copied accross, but in a scalable way (not N^2).
But I thought I made the point that each site can filter the data,
thus eliminating the need to store everything on any one machine.

I guess it's the old time-versus-storage tradeoff: in my proposed
system, you can query the the whole database using only enough disk
space for one message -- but it will take you a month or so to conduct
your query. And you can configure your system for anything from that
to storing the whole database, thus allowing a global query in a
single local transaction.

In any case, my point about scalability is: each site makes the
trade between speed and storage independently, and no singe site
is burdened with any global responsibilities.

[Boring stuff where you agree with me deleted... :-]

>Security is a problem though, at least with ALIWEB I know that the
>index comes from the server I just pulled it off,

You may believe that, and you may be right most of the time.
But there is no mechanism to ensure it. DNS
can be spoofed. TCP/IP is not inherently secure. In short, security-
minded folks won't be satisfied.

>> Plain text messages containg URL's should be deprecated in favor of
>> articles that use MIME to indicate text/html,
>> application/wais-source, message/external-body, etc. body parts.
>
>As you are going to end up with a large database, you do need some sort
>of attribute-value schema so that you can search on something sensible.
>I suggest that IAFA-like templates do that quite nicely.

Sure... that's pretty much what I had in mind in the first place...
as long as they're explicitly tagged as such.

Dan