Re: Caching Servers Considered Harmful (was: Re: Finger URL)

Chris Lilley, Computer Graphics Unit (lilley@v5.cgu.mcc.ac.uk)
Mon, 22 Aug 1994 21:08:06 +0200

In message <Pine.3.85.9408221141.A462-0100000@enews> Rob Raisch said:

> The publisher holds complete responsibility over their product, in
> content, presentation, timeliness and distribution.

Is that a statement of current (traditional, non-digital) practice or a wish?

In the 'real world' publishers would probably like to have this sort of absolute
control, but they do not. Their distributors 'cache' stocks (bt arrangement with
the publishers). The retailers cache stocks. They may carry old editions until
they are sold. The second hand market does not give the publishers any control
whatsoever over content (annotations, pages missing) presentation (brown paper
covers, sellotape, stains, etc) timeliness (all edition back to the first) or
distribution.

> By running a caching
> server on my content, you are robbing me of any control I might have over
> the timeliness and distribution.

Not necessarily. Individual users saving useful pages to local disk gives a far
worse problem, becuase then there is no mechanism whatever for updates. Proxy
caching gives you expiry dates, last modify dates and so on. This is clearly an
improvement on ad-hoc saving bu individual users.

> You can provide no guarantee that the versions that you present to your
> users are accurate or timely

Sigh. This comes up again and again.

Accurate: you could put an MD5 checksum on it, and/or a digital signature or
SOAP. Again, in the absence of this you have no guarantee that individual users
haven't modified files either, which could be shown to colleagues and so on.
By making local response times quicker, caches actively reduce the risk of such
tampering.

Timely: use expires. Have the proxy do a conditional GET on your server (I
though this was always done, but it seems some proxies only do this if the
cached document has expired).

> Further, I have no idea of the number of
> consumers who view my content through your cache [...]

I accept that this _is_ a problem.

Particularly for sites which must for various reasons keep track of access, such
as to justify funding, give evidence of an outreach programme, etc.

Once server to server communications become worked out, it would be nice if
servers could indicate that the document they are serving is one on which access
statistics are desired. True clients throw this information away. Proxy caches
note requests for this file in an access log, and periodically (perhaps at a
time interval suggested by the original server) forward the relevant extract of
the access log back to the original server. Possibly by a means other than http;
possibly through PUT. Either way, into an incoming area for later merging with
data from the servers access log to produce whatever usage reports are required.

> Even assuming that you are acting in good faith [...]

Lastly, I think you need to distinguish between caching proxies such as the CERN
one, implementing published algorithms, and a rogue mischeivous cache that
someone could in principle write to mess up your files. Yes, in theory someone
could write a proxy that ignores pragma: nocache, that never expires anything,
and changes all occurences of raisch@internet.com to lilley@mcc.ac.uk. In
principle, similarly, someone could write a browser that did similar devious
manipulations, or write network router box software that faked packets coming
from your server...

> The copyright issue is the more difficult one. In light of the previous
> argument, you are archiving an original work.

Perhaps you could examine how this issue is already handled in store and forward
systems such as email, news, and all uucp-type networking. Intermediate stores
are used there. Plus how many network buffers, bridge router boxes, repeaters,
internal buffers in the browser software hold a complete or partial copy of this
original work. As you say,

> (I'm ignoring any arguments that copyright law must be redesigned in light
> of digital distribution. I don't think anyone would disagree with this.

Certainly not me. Hope these thoughts are of some use.

--
Chris Lilley
+--------------------------------------------------------------------------+
|Technical Author, ITTI Computer Graphics & Visualisation Training Project |
+--------------------------------------------------------------------------+
| Computer Graphics Unit,        |  Internet: C.C.Lilley@mcc.ac.uk         |
| Manchester Computing Centre,   |     Janet: C.C.Lilley@uk.ac.mcc         |
| Oxford Road,                   |     Voice: +44 61 275 6045              |
| Manchester, UK.  M13 9PL       |       Fax: +44 61 275 6040              |
| X400: /I=c/S=lilley/O=manchester-computing-centre/PRMD=UK.AC/ADMD= /C=GB/|
| <A HREF="http://info.mcc.ac.uk/CGU/staff/lilley/lilley.html">my page</A> | 
+--------------------------------------------------------------------------+