Re: forwarding cache requests

stumpf@informatik.tu-muenchen.de (Markus Stumpf)
Errors-To: listmaster@www0.cern.ch
Date: Tue, 22 Mar 1994 00:33:46 --100
Message-id: <2mlan4$2bc@hpsystem1.informatik.tu-muenchen.de>
Errors-To: listmaster@www0.cern.ch
Reply-To: stumpf@informatik.tu-muenchen.de
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: stumpf@informatik.tu-muenchen.de (Markus Stumpf)
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: forwarding cache requests
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Organization: Technische Universitaet Muenchen, Germany
Organization: Technische Universitaet Muenchen, Germany
Content-Length: 4048
reinpost@info.win.tue.nl (Reinier Post) writes:
[ sorry, I've restructured the text a little ]

>Cache-date: <date>
>    the time the document was served from the cache in answer
>    to the present request

So this is the same as "Date:" ???

>Cache-last-refreshed: <date>
>    the time the document was last fetched into the cache,

Do you mean with "fetched" "checked to be valid" ???

>Cache-last-modified: <date>
>    the time it was last fetched and found to be different from the
>    previous version,

This is the only one I am currently sure I know what you mean with it :/

>Cache-via: <url> [, <url>]*

Would here be a approach like the SMTP mailers do with adding Received:
lines sufficient? i.e. allow for a unlimited number of those tags
and each cache/proxy that gates the document adds a "identifier"
that it could recognize again and thus detect loops.

Okay, let me explain my thoughts on this topic.
I have a working proxy/cache server running based on ncsa httpd-1.1.
I did the proxy module, Guenther Fischer from Chemnitz made the cache
module. The approach Guenther uses is as follows:

If you don't have the document in the cache, fetch it and put it in the cache.
If you have the document in the cache,
   check with a stat() the last modification time of the file.
   if this is longer than a certain timeout
      send a HEAD request and check if file has changed.
      if it has changed, update the cache
         else update the last modification time of the file (utime()).

I currently don't know the strategy of the CERN server, but I think it's
rather similar.

What do we need for "good" caches.

1) "forwarders": if you want to reduce e.g. national and international
   traffic, one could imagine a big national cache, which acts as proxy
   to international sites that could be used by local proxy or cache
   servers.
2) we should be able to have read/write and read/only caches.
   we could have a "master" that writes the cache and "slaves" that
   only have read/only access to the cache and ask the master to
   update the cache if necessary. This would allow distributing
   the load of fetching documents to some machines accessing the same
   cache over e.g. NFS and having the burden of updating the cache,
   which is IMHO rare, as most of the documents are rather static,
   to one master.

What is needed for inter-cache communication:

o   what I think is REALLY URGENTLY NEEDED is another way to handle
    GET requests. I'd asked that before but got no answer. I don't
    see any problems in requesting more than one document within
    one server connection. BUT currently all servers close the connection
    after the last byte sent, EVEN if there is a Content-length: field.
      I'd like to propose that if there is such a field the client
    has to close the connection if it doesn't want more documents from
    that server or be able to send another GET or whatever!
    (but maybe this should be discussed under another subject).

o   one great idea is the conditionally GET via If-Modified-Since:
    The only problem I see currently is: how do I determine as
    forwarder or client if the other side supports it? If I send
    a conditionally GET and the server on the other side does not
    support it, it will send the document and this is currently worse
    than the possible overhead of sending a HEAD followed by a GET.

o   As with the approach Guenther Fischer uses a tag like
    Cache-last-modified: would be sufficient, as the client or forwarder
    could see from this date, when the cache server has last checked for
    accuracy of the document in the cache. What would be informative
    would probably be a Cache-Update-Interval: tag (in minutes) for this
    specific document, to tell the forwarder when it is useful to
    check for this document at this cache again, or, that it would
    not get a newer version from that cache within the next n minutes
    anyway. (of course the Cache-via: useful and needed!)

Is this sufficient?
Comments? Ideas?

	\Maex