Why the Web needs to change

Piglet <ee01th@surrey.ac.uk>
Errors-To: listmaster@www0.cern.ch
Date: Wed, 30 Mar 1994 12:11:32 --100
Message-id: <9403301105.aa14038@ainur.ee.surrey.ac.uk>
Errors-To: listmaster@www0.cern.ch
Reply-To: ee01th@surrey.ac.uk
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Piglet <ee01th@surrey.ac.uk>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Why the Web needs to change
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 5390
Being fairly new to this discussion group, I really don't know
how new the ideas I am presenting here are.  Comments welcome!
The following can also be seen on:
How the Web must change.

Consider first the library

Here's a little story...(by the way, this is completely
fictional - as if you couldn't guess anyway!) 

One day, Timothy reads an article in his weekly magazine, Lego
Engineering, about many Civil Engineering disasters resulting
from the misuse of Lego in their construction. At the end of the
article, in it's references, it mentions a book The Tacoma
Narrows Bridge: Lego caused it's failure. 

Now, Timothy, being interested in this topic, decides to see if his
local library has a copy. He pops down, and asks the Librarian.
She apologises profusely as they do not have a copy, but she is
willing to obtain a copy through inter-library loans. A week or so
later, Timothy has the book in his hands, and reads it through.
He then returns it to his local library, and it is returned to the
library it was borrowed from. 

Shortly after this, Timothy mentions this article to his friend
Duncan, who borrows the magazine to find out more. He, too,
wants to borrow the book mentioned and goes to the same
Library. Again, the Librarian obligingly gets the book through
inter-library loan. Duncan reads and returns the book, just like
Timothy before him. 

At work, Duncan mentions the article to his colleague, Bevis.
Bevis borrows the magazine, and he to wants the book. He goes
off to the library, and asks the Librarian. The Librarian, being
wise (as librarians are) had noticed that this book was becoming
quite popular, so had ordered a copy for the library. Bevis,
therefore, borrows the library's own copy, which is much faster
as he doesn't have to wait for it to arrive. In time, he reads, and
returns the book. 

Let's now look at the World Wide Web

With the current design of the Web, there is a problem
concerning traffic (as the people at NCSA will tell you!). 

At present, a document is referenced by its protocol and location.
In order to reduce network traffic, some method of storing the
documents locally must be found. 

It is easy for a web administrator to bring down a copy of a
document (which I shall call DocA) and store it locally. However,
if documents that DocA are also to be stored locally, then the
references in DocA must be changed to point to the local copies
rather than the original ones. It also means that any document
that points to DocA will actually point to the original version,
rather than the local copy, which rather defeats the object. One
other problem with this is if the original copy of DocA is updated,
this will not be reflected locally. 

We therefore propose a change:

Each document should have a unique identifier (as indeed it
already does) which would be generated by the site creating the
document, much in the same way that publishers generate

When a client requests a document, rather than going to the
source of the document (the publisher), it asks the local server 
(the library) if it has a copy. If it does not, the local server fetches
it from the original site and sends it to the client for display. If
that document is requested frequently, the local server makes a
copy of it, and when asked for it again, simply asks the document
source if the copy it already has is up to date and only retrieves it
again from the source if necessary. 

This whole procedure could be done hierarchically, for example
based on internet domains. So, the surrey.ac.uk server doesn't
have the document it asks some generic ac.uk super-server if it
has it, which in turn passes the request on to the uk super-server,
which if necessary downloads the document from source. As
before, the time-stamping checks sould be done hierarchically

The advantage of a hierarchic scheme is that there could be some
documents that lots of people in the UK want to read, but these
people are all from different sites (e.g. documentation on how to
set up a server) 

If the protocol is set up correctly, the super-servers themselves
need not keep copies of the most looked at documents. If a server
in the hierarchy already has a copy, it simply asks for a
datestamp to check its copy is up to date. If a super-server
receives such a request, it simply passes it on to the next level up,
not bothering to keep a copy for itself. 

Also, if a document is not requested for a long time, the relevent
server in the hierarchy deletes its copy (unless, of course, it is the
original source of the document!) 

The important point we are trying to make is that the reference
which uniquely identifies the document should be just that--a
document identifier, rather than where an up to date copy of the
document can be found. 

In fact (as a complete afterthought) those pages that the client
caches during a session could be checked against this unique
reference, so if the user doesn't use the Back option or the 
Window History that Mosaic provides, it still calls up the
cached version rather than reloading it from the local server. 

Comments and questions to either of the people below are

T.Hunt@ee.surrey.ac.uk (http:www.ee.surrey.ac.uk/People/T.Hunt.html)
D.White@ee.surrey.ac.uk (http:www.ee.surrey.ac.uk/People/D.White.html)
30th March 1994