Re: pragma no-cache -- Can we make it more useful?

Ken Fox (fox@pt0204.pto.ford.com)
Mon, 18 Jul 1994 16:27:44 +0200

> > client GETs document, but this is routed via proxy cache
> >
> > if expired(cached_document)
> > then
> > proxy GETs/If-Modified "/cgi-bin/changes" at original server
> >
> > if (original_mod_date > cached_mod_date)
> > then
> > proxy GETs/If-Modified document at original server
> > put it in cache
> > endif
> > endif
> >
> > return cached_document
> >
> > This algorithm works against all servers.
>
> It doesn't always work.

It *will* always work.

> 1. The proxy/cache may be a firewall and the original server is not reachable.
> (or the original server may be down because it crashed under the load ;-)

If the document is not expired, then the proxy just returns it --- no need
for the orignal server to be available. Non-proxy servers will never have
an expired article (obviously!) so this applies to them as well.

If the original server is down or unreachable then the get will fail, the
cache will not be updated and the proxy will simply return the cached
document (if it exists.) The "return cached_document" is *outside* the
conditional.

If the proxy can *never* get to the original server, then the cache will
never be loaded and the document is never reachable. In this case, you must
load the "proxy" through some other mechanism --- i.e. not HTTP. Therefore,
the whole idea of proxy/firewall servers won't work for you and none of this
discussion applies.

> 2. It is inefficient, unnecessarily bothering the original server when all the
> user wants is recent, but not necessarily the absolute latest version.

The original server is only "bothered" when the cache expires. And it will
only be "bothered" once to validate all documents that may be cached on the
proxy. If the idea of dynamically updating the proxy cache is undesirable,
then simply add code to the expiration logic to restrict cache updates to
some other criteria. If you want some way of having a client explicitly
tell the server: "Give me whatever you have right now without checking
anything first" then extra information must be passed along with the get
request. Anybody know how this can be done?

Someone recently used a stock market as an example of a type of document
that should have different expirations. This is an easy thing to solve:

During the day, the quotes are updated every 5 (or some small n)
minutes. So set the expiration for 5 minutes.

The last quote of the day won't expire until 5 minutes after the
opening of the market on the next business day. So set the
expiration for then.

Another example of classic literature was posted. This is also easy:

For static documents (i.e. the author has no intention of ever
changing it) set the expiration the expiration to infinity.

Do we have a notation for infinity?

The idea in the previous examples is to show that a server can often be
"smarter" than a person in deciding when the document cache should be
refreshed. Why should the system require a reader to understand when the
information is old? Pure performance hints are something else... but there
I would still like to see the reader put limits on the cost of the transaction
instead of trying to explicitly tell the server how to do something.

You could use this same cache update algorithm to selectively mirror/update
a server. You don't *have* to do it via a usage-driven proxy... but that's
probably the easiest/best way to do it. I can think of only two reasons not
to do it that way: (1) you have a slow/congested/unreliable link and want to
schedule (i.e. batch) cache updates; (2) you have a security mechanism that
requires hand auditing/verification of the contents of the cache.

- Ken

-- 
Ken Fox, fox@pt0204.pto.ford.com, (313)59-44794
-------------------------------------------------------------------------
Ford Motor Company, Powertrain | "Is this some sort of trick question
CAD/CAM/CAE Process Integration | or what?" -- Calvin
AP Environment Section |