Re: Caching Servers Considered Harmful (was: Re: Finger URL)

Daniel W. Connolly (connolly@hal.com)
Mon, 22 Aug 1994 19:19:37 +0200

In message <Pine.3.85.9408221141.A462-0100000@enews>, "Rob Raisch, The Internet
Company" writes:
>
>[Putting his publisher hat on]
>
>Because anyone running a caching server runs the dual risk of presenting
>out-of-date information to their users and can be in direct violation of
>international copyright law.

Yes, these are risks. This is why I believe it is necessary to have
(1) a formal model of computation to decide, in the abstract
what the "correct" answer to a user's query is, and
(2) widely deployed fault detection and tolerance mechanisms
to increase reliability of actual computations, to detect
errors, and to generally promote confidence in digital
communications.

(see http://www.hal.com/%7Econnolly/drafts/formalism.html for
an attempt at such a formalism)

>You can provide no guarantee that the versions that you present to your
>users are accurate or timely.

This is extremely misleading, if not just plain incorrect. While there
are no guarantees (and no fault detection mechanisms...) the HTTP
protocol (as specified, and as implemented by NCSA HTTP and CERN HTTP,
as far as I know) ensures that caching proxy servers do not compromise
the protocol by serving up out-of-date information. When a client does
a GET, the caching proxy is bound by the Expires: information
associated with its cached representation of an object. If there is
none, it must do a round trip to the server of origin. This round trip
is often optimized with the If-Modified-Since mechanism.

> Further, I have no idea of the number of
>consumers who view my content through your cache or what they view, how
>and when.

Do you have a right to know this? There was a lot of talk at the WWW
conference in Geneva about a "Bill of Rights" for the information age.
This is an interesting issue to add to the list.

>Of course, I can mark my information as being uncacheable, but will you
>honor that request? Your interest is to provide content to your users
>with as little impact on your communications resources as possible. I
>believe that your goals and mine are not compatible.

I suspect these issues can generally be addressed by the conventional
whining to the administrators to get these policies enforced. If your
data is so valuable that this level of reliability is unacceptable,
you will have to employ more complex protocols, presumably employing
cryptographic techniques to provide authenticity, accountability,
etc. Any information that has a more restrictive copyright than, for
example, the MIT X11 copyright has no business travelling over
conventional HTTP.

>I expect that most professional publishers will not serve content to any
>site which caches unless they can enter into a business relationship with
>that site. Unfortunately, this presents a very interesting N by N
>problem, as publishers and caching servers proliferate.

This is true. Most folks that put info on the web are either academics
or marketing types. Both of these camps gain, rather than lose, when
their info is redistributed without their knowledge. Folks that have
something to lose by the redistribution of their information typically
use physically secure networks, today.

But this will change with the emergence of cryptographic security
techniques...

Dan