Re: pragma no-cache -- Can we make it more useful?

karl@cavebear.com
Wed, 13 Jul 1994 21:48:43 +0200

> > I've been reading the http protocol specification and
> > came across the request function pragma no-cache.
>
> > I'm glad to see that there is a way for a client to express
> > its need for the latest-and-greatest version of the referenced
> > document
>
> Ah. I think perhaps you misunderstand.
>
> When using a cacheing proxy server, you *do* always get the latest and greatest
> version - unless the original server is unreachable, in which case you get what
> was in the cache.
>
> When the proxy gets a request, it sends a HEAD request to the original server to
> check that the cached file has not expired and has not changed. If all is OK,
> the cached copy is sent to the client.

I've just fired up LANWatch and watched what happened. I'm running the Cern pre-3
httpd with caching. I just fetched some stuff from the cache. No packets went
to the outside world.

> > For example, the client might want to say, in effect, "give me the
> > document referenced by this URL. I am willing to accept a copy that
> > might be 2 hours old."
>
> My comments above notwithstanding, this may be useful (but not in the context of
> pragma nocache). BUT, how would you specify this at the user interface level? It
> will change from document to document, so would have to be set by the user. For
> example, a copy of Shakespears sonnets that is 1 year old is likely to be fine,
> wheras a video grab of a coffe pot that is 5 minutes old might be worthless ;-)

I'm not sure how to best present it to a user. However, I sure
imaginative GUI designers can do it. My own personal experience is
illustrative. I've been (slowly) reading the WWW documents and I tend
to flip back/forth over a period of days, so I don't keep a copy of my
reader running continuously. I have been greatly helped by the
reduction in access delay due to the cache, and because I'm learning
rather than researching, I don't care if I have the absolutely latest
draft document. However, on the other hand, I do look at some
government reports that are revised daily. It is very important to me that
I can skip the version that is in my cache.

Stock reports are a good example. During the day I want really fresh
copies, but at night, when the market is closed, I'm happy to live
with the snapshot taken when the marked closed.

> Being prompted for the acceptable age with *each* GET would be tiresome. It
> would also only matter if you were using a proxy, not if you were talking to the
> originbal server. As the whole point is to make proxy cacheing transparent to
> the user - nop URLs to edit, it all works as before but faster - on balance I do
> not see the merit of your proposal. It would save network time at the expense of
> user time, which is the wrong way round in my book.

A reasonable defaulting scheme can be set up. To me how it is presented to
the user is a GUI issue.

And I'm not sure that all the consumers of WWW data need GUIs...
robots will want to gather stuff and depending on their jobs they may
require up-to-date or can merely get by with aged documents.

> > (When I say "might be X hours old", I'm referring not to the age of
> > the document since it last changed, but the time since it may have
> > been copied from the authoritative server.)
>
> I see why you say this, but there is no need, as described above. The age of the
> document since it last changed is readily available information.

To use the technique that I think you are suggesting, the viewer would have to
send a HEAD request to get the document change date. Then the viewer would have
to look at the response and:

if (response_is_from_an_uncached_server() == TRUE)
{
get_the_document();
}
else { /* Response is from a cache */
if (is_response_new_enough() == TRUE)
{
get_the_document();
}
else get_the_document_but_say_pragma_no_cache();
}

The question is what is the semantics of is_response_new_enough(). If
you are to look only at the origination date/time of the document as
it exists in the cache, you don't have enough information to know
whether there might exist a newer revision that has occured since the
cache snapshot was made. If you know how long it has been since the
snapshot was made, then you can at least give the user a chance to
decide whether he/she/it ought to take the risk that the document in
the cache is new enough according to their own criteria.

What is suggesting itself to me as I type is an alternative mechanism.

I've seen a propsed header line that says that a document has been delivered
by a caching server.

What might be useful, in lieu of my notion about modifiying pragma
no-cache is to have a header line which indicates the data/time at
which the copy in the cache was snapshotted from the authoritative
server.

In a situation in which there are a cascade of caches, this date/time would
be the same for all.

--karl--