Re: Re caching of frequently used pages

guenther.fischer@hrz.tu-chemnitz.de (Guenther Fischer)
From: guenther.fischer@hrz.tu-chemnitz.de (Guenther Fischer)
Message-id: <9312151743.AA25564@etzel.hrz.tu-chemnitz.de>
Subject: Re: Re caching of frequently used pages
To: J.Larmouth@iti.salford.ac.uk
Date: Wed, 15 Dec 1993 18:43:48 +0100 (MET)
Cc: www-talk@www0.cern.ch
In-reply-to: <9312151513.AA29473@dxmint.cern.ch> from "J.Larmouth@iti.salford.ac.uk" at Dec 15, 93 03:01:00 pm
Reply-To: guenther.fischer@hrz.tu-chemnitz.de
X-Mailer: ELM [version 2.4 PL21]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 1900
> E-mail from: Prof J Larmouth              J.Larmouth @ ITI.SALFORD.AC.UK
> 
> Subject:      Re caching of frequently used pages
> 
> ...

> 
> Of course,  any greater use of caching must raise the issue of an
> "expected frequency of update" value to be held with each document,  in
> order to assist the LAN-cache server in knowing how long to keep things.
> 

I have an experimental cache server running based on Tony Sanders Plexus.
Users are connected per default to this server through the
WWW_http_GATEWAY environment variable for Mosaic (Unix). Mosaic for Windows
and Lynx will follow. (I hope soon ... - we need it).

setenv WWW_http_GAETWAY www.tu-chemnitz.de:8002

Then Mosaic gives all http://host:port/... URL to www.tu-chemnitz.de:8002 
in the form

GET /host:port/...

The main base of "my" caching is:
- Server is HTTP/1.0 and
- has the Last-Modified (L-M) value in its header.
- do not handle queries
- run as a simple gateway for servers not to cache

I can configure the cache server:
- list of servers to cache
- TIME_SHORT for html/text
- TIME_LONG for other (gifs etc.)

Algorithm implemented at now:

- If request not in cache: get it -> give it to client and cache
  (I store it with full header)
- If request in cache:
  - if NOW - cache files TIME < TIME_SHORT
      put it to client 
    else
      put it to client      and
      get the head from the server
      if L-M of cache == L-M from server
         utime the cache file to now
      else
         unlink the file and get it

For TIME_LONG files in the same manner.

I've also started a students work to clean and refresh the cache
without client (at night or weekend).

You can try my cache server:

www.novell.com and www.ncsa.uiuc.edu are on the list.

	~Guenther
-- 
Name:      Guenther Fischer
Institute: TU Chemnitz, Universitaetsrechenzentrum
Phone:     0371 668 361
mail:      fischer@hrz.tu-chemnitz.de