Re: Searchable Indexes: LISTEN NOW!Tim Berners-Lee <firstname.lastname@example.org>
Date: Thu, 1 Jul 93 11:38:34 +0200
From: Tim Berners-Lee <email@example.com>
To: Nathan Torkington <Nathan.Torkington@vuw.ac.nz>
Subject: Re: Searchable Indexes: LISTEN NOW!
Summarizing, we need
1. A standard URL like http:/Catalogue.html
2. A convention for saying "no" which could be just an existing
but void file.
3. A standard format for the file.
4. A PD program distributed with the server to generate
the file so that many people will do it weekly.
The format could be a set of
links where the content was the title of the document.
<a href="/docs/overview">Overview of our documentation</a>
which would have the advantage of human readability.
I agree it would be useful to have some depth information
or at least a weight.
It could be alternatively
<LINK TITLE="Overview of hour documentation"
where we could argue for hours about the meaning of
WEIGHT. (WEIGHT is an extra, but the rest is standard
In either case, a "no" catalogue could contain
a ploite message, and no list.
A FEW NUMBERS
From time to time I run a breadth-first traversal of the web from the
of servers. Yesterday, counting unique
hostname:port pairs (without checking for CNAME aliases),
95 registered servers (level 0)
99 servers refered to by level 0 (level 1)
172 servers referred to by level 1 (level 2)
174 distinct servers in levels 0-2.
Going this deep takes long enough (an hour or so). I use a filter at
each stage to cut out known slow sites (typically Eastern Europe) or
known buggy servers. I also have to clean the links quite a bit for
references with local hostnames only (not FQDN) and a small
amount of junk. [Obviously mailing the webmaster at sites
with bad links would be a possibility.]
This is only for interest. I don't generate a index. The engine is
just a bunch of scripts using www -listrefs.