Re: using WWW to follow gopher links

timbl (Tim Berners-Lee)

Mail folder: WWW Talk 1992 Archives
Next message: Jean-Francois Groff: "forwarded message from emv@cic.net"
Previous message: David Krieger: "Re: WorldWideWeb news: New software includes Gopher, News, Telnet access "
Reply: Petri Ojala: "Re: using WWW to follow gopher links "

Date: Tue, 4 Feb 92 08:44:25 GMT+0100
From: timbl (Tim Berners-Lee)
Message-id: <9202040744.AA24645@ nxoc01.cern.ch >
To: emv@cic.net
Subject: Re: using WWW to follow gopher links 
Cc: www-talk@nxoc01.cern.ch, gopher@boombox.micro.umn.edu,
        wais-talk@quake.think.com

Ed,

All good stuff -- the world is coming together.

What do you think is the most useful www option for tracing what's out there?
I have two suggestions - one is a -list option (or something) which makes
www return only list of related documents, one on each line.
Another is one which will recursively run down a tree. The
trouble with the latter is telling it where to stop. Depth isn't really good enough
as probably you also want to constrain it to only gopher files, for example.
Perhaps the most flexible would be just the first option, with a perl etc script  
around ir to be flexible. I'd link to see for example lists of all telnet sites
references by gopher or www links, a wais server for www documents and gopher  
nodes.  My guess is that one index could handle the lot so long as one trimmed
off the few places where people have gatewayed in the entire ftp world, etc.
Then I'd like to see a www server for that index so that one could jump straight to  
the docoument wherever it came from.... I have to write an articel today, maybe
tomorrow I'll put in www -list.

KUTGW
	Tim

[PS: I assume you meant -p rather than -np in the www command. Perhaps we
should put in -np if it is more intuitive than -p for no paging.
I'll look at the CR problem.]

__________________original message follows
Tim,

Some more results of wais/www/gopher collaboration.

I have a new WAIS server running at wais.cic.net, called
"midwest-weather".  It's fed by loading in a bunch of weather reports
from a gopher at Minnesota every hour.  That system gets them from the
"weather underground" at Michigan using some hairy expect scripts, I
figured it'd be easier to get things out of gopher instead.

The script looks like:

WEATHER=gopher://mermaid.micro.umn.edu:150/00/Weather
www -n -np ${WEATHER}/Indiana/Fort%20Wayne | sed -e 's/.$//' > fort-wayne.in
www -n -np ${WEATHER}/Indiana/Indianapolis | sed -e 's/.$//' > indianapolis.in
www -n -np ${WEATHER}/Indiana/South%20Bend | sed -e 's/.$//' > south-bend.in
[...]

For some reason the gopher files are coming out of www with extra ^M's
on the end, as if they were DOS files; so the sed thing gets rid of them.

I don't see a way to do this with just one invocation of www, so
instead it runs once for each file.

Neither gopher nor WWW have the notion of a "recursive directory
listing", either some complete overview of the structure of the system
or some skeleton outline.  (I realize it's arbitrarily hard to do so
since any link could point off anywhere else.)  That makes it tougher
to do an archie-style catalog.  I think it wouldn't be that hard to
build a tree-walker for gopher that prints out a list of the
directories on every system that it can find and also the text of all
of the stuff that's in the ".about" directories.  At the very least
I'm doing some of that by hand now (just a script like the one above)
& waising it so I have some clue what all is out there.  *not* a 

replacement for the per-site indexes, but a cross-section.

--Ed