Re: web roaming robot (was: strategy for HTML spec?)
Tim Berners-Lee <timbl@www3.cern.ch>
Date: Thu, 14 Jan 93 09:41:13 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9301140841.AA00242@www3.cern.ch>
To: Thomas A. Fine <fine@cis.ohio-state.edu>
Subject: Re: web roaming robot (was: strategy for HTML spec?)
Cc: www-talk@nxoc01.cern.ch
Reply-To: timbl@nxoc01.cern.ch
Tom,
Great idea, LOTS of applications. Traversing a tree to a given depth
makes a book. Tony's WWWVeronica is a great idea -- particularly as
it can pick up WAIS indexes and Gopher and telent sites all together
and make a megaIndex of the whole scene!
Implementation ideas: The WWW library anchor object actually keeps
track of every anchor visited. It uses a hash table to speed up the
generation of new names. The HTAnchor_findParent or somesuch routine
find it if it exists otherwise creates it. You just need a return
code to tell you whether you have a new one or not, whether to
truncate the search there.
You could use the HTRules translation table for controlling the
domain of search. Useful things to do with the search are
1. list all references OUT of the domain from within it
For example, find all telnet sites listed in Hytelnet.
List all WAIS indexes mentioned in the Web's WAIS catalogue.
2. list all referencs within, depth-first, with depth indication
This can drive a book-making script
3. Apply command to everything within the domain
for checking
etc etc
I even wondered about putting the traverse code into the library do
that arbitrary browsers could use it. It would generate a hypertext
list of all objects found for example.
Tim