Caching and Expires (was: Initializing HTTP headers from HTML)

"Roy T. Fielding" <fielding@simplon.ICS.UCI.EDU>
To: ellson@hotsand.att.com
Cc: dsr@hplb.hpl.hp.com, www-talk@www0.cern.ch
Subject: Caching and Expires (was: Initializing HTTP headers from HTML)
In-reply-to: Your message of "Thu, 06 Jan 1994 15:57:41 EST."
             <9401062057.AA13697@hotsand.dacsand> 
Date: Fri, 07 Jan 1994 06:29:37 -0800
From: "Roy T. Fielding" <fielding@simplon.ICS.UCI.EDU>
Message-id: <9401070629.aa10138@paris.ics.uci.edu>
Content-Length: 4414
John Ellson writes:
> 
> I thought you were addressing a cache coherency problem:
> 
> 	before serving a document a cache server should check if 
> 	the original document still exists and if it is unchanged.
> 
> 	if it no longer exists the no document should be served
> 	and the cache copy should be purged.
> 
> 	if it has changed then the cache should be refreshed and the
> 	new version served.
> 
> 	if the original site cannot be contacted then the document
> 	should be served with a warning that the validity of the
> 	document could not be verified.
> 
> 	if the document still exists unchanged then the cache copy can
> 	be served.
> 
> It seems that the Expires mechanism is a little more hand-offish than
> this.  Perhaps Expires provides sufficient coherency control?

I really don't think the Expires header should indicate anything other
than a hint to the reader (and possible cache manager) that the document
should not be considered "current" after a certain date.  As such, it
doesn't provide any coherency control, as would be expected by an optional
header.

**************************** A PROPOSAL *****************************

The cache coherency problem outlined (very well) above is a separate issue
because it requires a special request to the original server to determine
the status of the actual document.  Although some people have suggested that
the HEAD request is sufficient for this purpose, I find it entirely too
inefficient for a caching mechanism (because of the server overhead from
connecting twice and finding the file twice).

The solution is to implement a conditional GET request -- one that includes
a date to be checked against the Last-modified date of the information object.
Someone else (I didn't save the message) suggested a solution along these
lines in which the normal GET request was followed by a Last-modified:
header similar to the current Content-type, Authorization, etc.

Formally, if the server receives the request:
----------------------------------------------
GET /ICShome.html HTTP/1.0
Last-modified: Thu, 06 Jan 1994 15:57:41 GMT

----------------------------------------------
Then the server would respond:

(a) If the object /ICShome.html is inaccessible (for whatever reason), 
    then the server should return a 4XX message just like it does now.

(b) If /ICShome.html no longer exists, the server should return a
    404 Not Found response (i.e. same as now).

(c) If /ICShome.html is accessible but its last modification date is
    earler (less than) then the date passed (Thu, 06 Jan 1994 15:57:41 GMT),
    the server should return a 304 Use Local Copy message (with no body).

(d) If /ICShome.html is accessible and its last modification date is
    later than or equal to the date passed (Thu, 06 Jan 1994 15:57:41 GMT),
    the server should return a 200 OK message (i.e. same as now) with body.

In this way, cache managers would just send a GET request with the
Last-modified date equal to the date it originally requested the local
copy of the object it has in its cache.

Note that implementing this protocol would have no effect whatsoever
on existing servers and clients.  Old clients (and any without caches)
would just continue making requests without Last-modified headers.
Old servers (at least the NCSA httpd 1.0 that I use) will already accept
a message of the above format and just ignore the Last-modified header.

How's that for a proposal?

*********************************************************************

> If Expires is an optional attribute provided by the author then what
> prevents stale copies staying indefinitely in cache servers?

Nothing -- if the cache manager wants to keep stale copies, there
should be nothing to stop them (nor do I think it is possible to stop them).

> If it turns out that we agree that something is required to make cache
> servers operate correctly, then should the additional mechanism be
> user visible, or should it be built into the client-server protocols?

Because the Web is uncrypted, everything is user visible one way or another.
I would establish a client-server protocol and leave it to the clients
as to what information should be passed on to the user.  I think the expires
date is useful regardless of whether a cache is present or not.


....Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                   (fielding@ics.uci.edu)