Re: pragma no-cache -- Can we make it more useful?

Martijn Koster (m.koster@nexor.co.uk)
Mon, 18 Jul 1994 14:54:43 +0200

As this discussions is all about proxy related stuff shouldn't it
have been started on www-proxy?

> I'd also like to see a "standard" document that a proxy can request
> that will return a list of all documents modified since some date.
> Sites that implement it would see a lot less traffic from proxy
> servers. Sites that don't aren't penalized.

(Incidentally, remote robots would also love such a facility)

This is quite difficult unless you use a local robot; sorting out the
local virtual paths and the virtual URL spaces provided by cgi scripts
is impossible to do just from the configuration files.

> This proposal is very similar to the standard "/robots.txt" document
> that robots/spiders/mirrors/etc. use to behave nicely.

For the Robot Exclusion this is less of a problem, as you generally
shut out entire URL trees, not every individual page in that tree. And
you don't care about modification dates.

> The document could be named "/changes.txt" or maybe
> "/cgi-bin/changes". It would probably be computed on-the-fly from a
> document database. I wouldn't expect anybody to do it with a file
> system traversal, but that is certainly possible.

I doubt this would be efficient on-the-fly, unless you have some mechanism
whereby anybody wanting to change a document has to somehow flag this
pro-actively -- which is a lot of administrative overhead.

If you use an unbounded local robot for web maintenance, then you
might as well have it look at and store "Last-Modified"s in an ls-lR
type file. But I wonder how many people would want to do this, and how
regularly.

> Assuming all of these optimizations:
>
> client GETs document, but this is routed via proxy cache
>
> if expired(cached_document)
> then
> proxy GETs/If-Modified "/cgi-bin/changes" at original server
>
> if (original_mod_date > cached_mod_date)
> then
> proxy GETs/If-Modified document at original server
> put it in cache
> endif
> endif
>
> return cached_document

/cgi-bin/changes would probably faster and more substantially than the
particular documents you're interested in. So you might well end up
negating any positive effect, even without the server-side problems.

-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html