Re: web roaming robot (was: strategy for HTML spec?)

Tim Berners-Lee <timbl@www3.cern.ch>

Mail folder: WWW Talk Jan-Mar 1993 Archives
Next message: Tim Berners-Lee: "Re: strategy for HTML spec? "
Previous message: Tony Johnson (415) 926 2278: "Re: suggested libWWW architecture"

Date: Thu, 14 Jan 93 09:41:13 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9301140841.AA00242@www3.cern.ch>
To: Thomas A. Fine <fine@cis.ohio-state.edu>
Subject: Re: web roaming robot (was: strategy for HTML spec?)
Cc: www-talk@nxoc01.cern.ch
Reply-To: timbl@nxoc01.cern.ch

Tom,

Great idea, LOTS of applications.  Traversing a tree to a given depth
makes a book.  Tony's WWWVeronica is a great idea -- particularly as  
it can pick up WAIS indexes and Gopher and telent sites all together  
and make a megaIndex of the whole scene!

Implementation ideas:   The WWW library anchor object actually keeps  
track of every anchor visited.   It uses a hash table to speed up the  
generation of new names.  The HTAnchor_findParent or somesuch routine  
find it if it exists otherwise creates it.  You just need a return  
code to tell you whether you have a new one or not, whether to  
truncate the search there.

You could use the HTRules translation table for controlling the  
domain of search.  Useful things to do with the search  are

1. list all references OUT of the domain from within it
	For example, find all telnet sites listed in Hytelnet.
	List all WAIS indexes mentioned in the Web's WAIS catalogue.
2. list all referencs within, depth-first, with depth indication
	This can drive a book-making script
3. Apply command to everything within the domain
	for checking
	etc etc

I even wondered about putting the traverse code into the library do  
that arbitrary browsers could use it.  It would generate a hypertext  
list of all objects found for example.

Tim