Re: Usenet news and WWW

Tim Berners-Lee <timbl@www3.cern.ch>
Date: Mon, 18 Jan 93 14:03:35 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9301181303.AA02388@www3.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: Re: Usenet news and WWW
Reply-To: timbl@nxoc01.cern.ch

> Date: Tue, 12 Jan 93 0:06:00 CST
> From: Karl Lehenbauer <karl@one.neosoft.com>
>
> Many of the issues that people seem to be grappling with are  
already
> handled by news.

Yes ... but on the other hand there are things which are already  
handled by 

> For example, we are talking about caching nodes.  News has highly  
evolved
> caching capabilities -- I mean, caching is what it is all about --  
both for 

> TCP/IP and UUCP-based links.

I agree.  There are some snags with nes, though, for the ultimate  
retrieval tool. One trouble is, news caches documents very simply.  
The distribution scheme is a flood broadcast.  This is  OK for real  
news (shortlived articles), although many sites sag under the load of  
a lot of stuff they never read.  There are strict limit on what  
anyone posts because of the incredible worldwide total system load  
and disk space usage per message.  There is no well-defined algorithm  
for picking up archived news.  The message Id of an article is not  
enough: you need to know at least one of its newsgroups, its date,  
and be able to deduce the archibe node name and the organisation of  
the archive.

The conventions of posing FAQ lists and other "periodic postings" are  
in fact an abuse of the prototcol, and would be better served by a  
retrieval protocol rather than a broadcast protocol.

I know that the NNTP WG is looking at this sort of area, and maybe we  
should all get together.

In a nutshell, if you take all the data everywhere available online  
and put it into news, the news system will die. The use of newsgroup  
names and lists negotiated by system managers to control what  
documents are visible and cached where is too crude, too inflexible  
-- it doesn't scale well.  The caching has to be automatic.

All this said, obvioulsy news and retrieval are coming togather,  
which is why we have tried to look for analogies (see previous  
messages to this list) between news articles and grousp to hypertext  
documents and lists at all times.

> Someone mentioned the issue of caching and node names, apparently
> node names would have to be rewritten by the cacher or need to be  
made
> machine-independent in some way (?).

Don't worry about that.. I think you are referring to a discussion of  
complete vs. partial UILs.  Let's keep that apart...

> Article IDs are guaranteed unique
> and are server-independent.  The mechanism for translating article
> IDs to filenames is fast and pretty highly evolved.
> 

> Oh, ugh, "Supercedes:" doesn't cut it unless the article  
superceding
> the old one replaces its article ID, which would probably be Bad.

Certainly there is a  case for having the "current" version of an  
article and a given "fixed" version of an article each explicitly  
addressable. See  
http://info.cern.ch/hypertext/WWW/DesignIssues/Versioning.html
and linked things for an old discussion of these issues.

> Expiration dates can be set with "Expires:",

Exactly.  If you read the provisional HTTP2 spec there is
an explicit link to rfc850 under "Expires". (See
/hypertext/WWW/Protocols/HTTP/Object_Headers.html#z5)

>  and sites that 

> archive certain groups already do special things on  
"Archive-Name:".


Really?  Tell me more.  Is that in an RFC somewhere?
reference? Example?

> Plus news is already ultra-portable.
> 

> Is the brief-connection-per-document approach of HTTP still  
necessary
> when the data is widely replicated?

As I said above, the mass of data will not be widely replicated.
You don't want a copy of all the data in the phone book, you just  
want access to it, plus a cache (which you may currently keep in you  
diary).  When you're talking about all the phone book sin the world,  
this is still more the case!

So theer will in the end be a directory system not unlike X.500 which  
will allow you to find who has the nearest copy of a document you  
want, in a fairly sophisticated way. And you will pick up up from  
that place.  Then you will click again and pick up a reference from  
somewhere else.

An important feature of HTTP is that the document is returned with  
the minimum number of round trips. (Sorry for all the people who have  
heard this before).  Conection-oriented protocols like WAIS and NNTP  
have an introductory dialogue which slows down the first fetch by n*  
the distance/speed of light.

We probably need horses for courses -- there is nothing wrong with  
keeping a few protcols around optimised for different access  
profiles.

(BTW I think there is a need for a point-point low bandwidth protocol  
designed for beating the hell out of a phone line. One that will keep  
the phone line occpied in a very inteligent way with look-ahaed  
fetches of related documents and lists or parts of them so that a  
home user with a big disk can explore with optimised ease when he is  
paying by the minute. Another good student project)

> It would be painful to go reap all the references that
> point to expired articles, although if a user traversed to an  
expired
> article, perhaps it could be pulled off of tape or an NNTP  
superserver 

> somewhere.
> 

> Clearly the authors of WWW think news is important because WWW has 

> nice capabilities for accessing NNTP servers.  What, then, is the 

> motivation for HTTP as opposed to, say, using news with HTML  
article 

> bodies?

I hope I've showed that broadcast data can't cover the NIR world. But  
I also hope that we can allow the models to converge and create a  
supermodel which encompases them.  This is the end goal of HTTP2 or  
should we call it NNTP3.