Re: Customer pull on HTTP2

Kevin Hoadley <K.Hoadley@directory.rl.ac.uk>
Date: Fri, 8 Jan 1993 14:32:21 +0000 (GMT)
From: Kevin Hoadley <K.Hoadley@directory.rl.ac.uk>
Sender: K.Hoadley@directory.rl.ac.uk
Reply-To: K.Hoadley@directory.rl.ac.uk
Subject: Re: Customer pull on HTTP2
To: www-talk@nxoc01.cern.ch
Cc: dsr@hplb.hpl.hp.com
Message-id: <Ximap.726510755.7590.khoadley@danton>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Dave Raggett raised some interesting issues in his message. In
particular:

> Caching
>-------
>
> It will be desirable to avoid overloading servers with popular documents by
> supporting a caching scheme at local servers (or even at browsers?). This
          As well as caching, replication would be nice. But this is
      only practical if resource identifiers do not contain location
      information (otherwise replication is only possible by making
      all the peer servers to appear to be one machine, as in the
      DNS CNAME suggestion I made some time ago).
          But if resource identifiers do not contain host information
      then you need an external means of determining how to reach
      the resource. This is analagous to routing protocols (an address
      is not a route ...)
          Such a system is probably over ambitious for now. Anyway,
      back to caching ...

> Servers need to be able to work out what documents to trash from
> their caches.
> A simple approach is to compare the date the document was received with the
> date it was originally created or last modified. Say it works out that when
> you got the document it was already one week old. 
> Then one rough rule of thumb
> is to trash it after another week. You can be rather smarter if there is a
> valid expiry date included with the document:

          I think this is silly. I haven't changed a document for
      six months, therefore it is safe to say that it won't be
      changed for the next six months ...
          This also depends on hosts agreeing on the date. To quote
      RFC1128, talking about a 1988 survey of the time/date on
      Internet hosts, "... a few had errors as much as two years"

> I think that we need to provide an operation in which the server returns a
> document only if it is later that a date/time supplied with  the request. 

          This would be useful as part of a replication system,
       as long as both ends exchanged timestamps initially so
       that the dates can be synchronised.

> Note that servers shouln't cache documents with restricted readership since
> each server don't know the restrictions to apply. This requires a further
> header to identify such documents as being unsuitable for general caching:

and also ...

> What happens if a copyright protected document is saved in the cache of a
> local server? We have got to ensure that the rightful owners get paid for
> access even when the document is obtained from a local server's cache.

         It may be stating the obvious, but once you allow a
    user to access you data such that they can save it, there is
    no technical way you can prevent them from publically
    redistributing your data. This is a social/legal problem,
    not a technical one.
         Accepting that nothing can be done to stop deliberate
    abuse of licensed information, there is a need to prevent
    accidental abuse. Probably the simplest way to do this is
    to mark the document as one which should NOT be cached.
          
         Perhaps this leading towards a very simple minded
    caching scheme a la DNS, where information is returned
    together with an indication of its "time to live" (TTL),
    ie how long this can reasonably be cached. Setting a default
    TTL for a server gives an idea of the "volatility" of the
    information contained therein 
         Unless a document is exported with world read access,
    it should always have TTL of 0.
 

Kevin Hoadley, Rutherford Appleton Laboratory, khoadley@directory.rl.ac.uk