Re: Caching Servers Considered Harmful (was: Re: Finger URL)

Brian Behlendorf (brian@wired.com)
Mon, 22 Aug 1994 23:01:47 +0200

On Mon, 22 Aug 1994, Sarr Blumson wrote:
> Rob Raisch, The Internet Company, says:
>
> [Putting his publisher hat on]
>
> Let's see how this goes if we substitute "book store" for "caching server"

Analogies are like paper mache - you can make anything out of them. I
don't think contrary arguments to Rob's post are served by comparing
caching servers to book stores. Book stores are still limited by stock
on hand, and they always provide accountability for the number of items
sold (discounting fraud). Caching servers, on the other hand, provide no
such accountability on their own. In fact, from the provider's
perspective accesses from caching servers are almost indistinguishable
from regular accesses (ignoring the fact that I can find them through the
USER_AGENT CGI variable).

We're not on the charge-for-access model either, and until secure
transaction protocols become standard we won't even think about it.
Thus, given the choice between

1) a user not getting the page, or
2) a user getting the page without our knowlege

I'd choose the latter.

Now, as to the problem of keeping info up-to-date: I disagree that it's
in the user's or cache manager's interest to ignore Expires: headers, or
to purposefully provide out-of-date information. If I was going through
a cache and was relying upon up to date stock quotes or weather
information for my job, any cache manager who that to me would probably
see themselves out of a job pretty quick.

Now, this doesn't mean I'm happy with the way caches work. And like GNN and
Rob, I am definitely interested in how many people are looking at our pages.
I've mentioned a couple of times in various forums that I'd be happy if the
caching server sent as a minimum a HEAD request for the object being
obtained, If its Last-Modified date were more recent that when the cache
grabbed it, it'd fetch the whole file (preferably doing this in one HTTP
connection), other wise it'd just serve the cached version. I can count the
HEADs (removing the subsequest GETs for the same object from the same server)
as full accesses, thus satisfying my interest as a provider in how popular my
pages are, and I'm guaranteed that they are getting the most recent version
of whatever I put up. If our server is overloaded or the link is ddown and
the HEAD request doesn't get through, then the cache should serve up its
local copy - this fulfills my wish that the user sees the page even without
my knowlege if the alternative is that it's unviewable.

I'm really not interested in playing cat and mouse with people who
unwittingly put our stuff up for copying. It's the people who wittingly do
it - running "Wired Areas" on BBS's, mirroring our site on a
public-access-unix system, etc - that we are willing to spend our time on
(both of which have happened). The next logical step for online
publishers, who see their site getting overloaded and whose distant
customers experience a huge amount of lag, is to set up mirror sites
as GNN has (we're looking into that, too).

Brian