Re: Caching Servers Considered Harmful (was: Re: Finger URL)

John Labovitz (johnl@ora.com)
Mon, 22 Aug 1994 18:50:27 +0200

[Rob Raisch:]
> You can provide no guarantee that the versions that you present to your
> users are accurate or timely. Further, I have no idea of the number of
> consumers who view my content through your cache or what they view, how
> and when.
> [...]
> Of course, I could be wrong. I have only been peripherally associated
> with publishers. Anyone from O'Reilly wish to comment?

Sure. Note that I don't know much about the mechanics
of caching servers, so if I'm off base in some way,
please let me know.

Our issue with caching servers has to do with accounting
of use in GNN (Global Network Navigator). To make GNN
freely available, we sell advertising. In order for an
advertiser to feel that they are making a worthwhile
investment, they want to know how many people are reading
their content. We can determine the number of `hits' on
a given part of GNN, but only if we have access to usage
logs. If someone's accessing GNN through a caching
server, we only know about one hit, plus additional hits
each time the cache entry expires.

In Neil Smith's paper `What can Archives offer the World
Wide Web,' there's a table (fig. 7) that lists `the most
popular remote sites accessed via the UNIX HENSA cache.'
Our main GNN site, nearnet.gnn.com, is up there at the
top, with approximately 4000 accesses. I haven't gone
through our logs to check specifically for accesses
from the HENSA caching server, but I would guess that
the number is substantially less than 4000. (From the
paper, the HENSA server will expire GNN non-GIF files
after two days, and GIF files after two weeks. Here's
a real-life ramification of caching: for those using
the HENSA server, our daily Dilbert comic strip is
available only once every two weeks.)

One solution would be for caching servers to generate
a summary of hits on URLs `belonging' to particular
servers, and to email that summary to a standard
email address at those servers. So even though we
at GNN may not receive the level of detail that we
get from our own logs (timestamp, hostnames, URLs),
we could at least receive from the caching servers
an approximation which we could integrate into our
reports back to our advertisers.

Comments?

--
John Labovitz
Global Network Navigator <http://gnn.com/>
O'Reilly & Associates, Sebastopol, California, USA (+1 707 829 0515)