Re: New service: The Unified CS TR Index

Rik Harris <rik@daneel.rdt.monash.edu.au>

Mail folder: WWW Talk Apr-Jun 1993 Archives
Next message: Tony Sanders: "MIME content headers"
Previous message: Mitra: "Gopher path to URL mapping?"
In-reply-to: Marc VanHeyningen: "New service: The Unified CS TR Index"

Message-id: <9305210038.AA25269@daneel.rdt.monash.edu.au>
To: Marc VanHeyningen <mvanheyn@cs.indiana.edu>
Cc: www-talk@nxoc01.cern.ch
Subject: Re: New service: The Unified CS TR Index 
In-reply-to: Your message of "20 May 93 13:21:53 EST."
             <454.737922113@moose.cs.indiana.edu> 
Date: Fri, 21 May 93 10:38:22 +1100
From: Rik Harris <rik@daneel.rdt.monash.edu.au>
X-Mts: smtp

> WHAT IT IS
> 
> It's pretty simple, really.  A daemon runs and pulls index files from
> many various FTP sites which archive tech reports (and similar
> material.)  At present, 39 FTP sites are included in the index, with
> over 1,400 reports included; both of these numbers are growing rather
> rapidly.  This information is then converted into entries for each
> tech report with hypertext anchors to the TR itself, producing a
> really big file.  This file is then searchable for keywords by a
> Simple Index Keyword Search (SIKS).  I believe it represents a
> potentially nicer general interface to this informational resources
> than existing methods (e.g. WAIS pointers to ftp sites).  It certainly
> is not the ultimate information browsing tool, but I hope it may push
> the migration towards such a little.

You might like to check out:

http://www.vifp.monash.edu.au/techreports/sitelist.html

It is the first run at converting my technical reports archive list
(posted regularly to comp.doc.techreports, news.answers, etc) to html.
It's also available as a WAIS database (cs-techreport-archives).
Ideally, I'd like to maintain it in html, and convert it to text for
posting, but I never seem to get enough time.  It contains about 140
ftp sites that I've collected that appear to archive technical
reports.

I also maintain a WAIS database of abstracts from technical reports
(cs-techreport-abstracts).  The format I use is being used now by
several sites, so they can get their data into the database very
quickly (daily automatic checks).  The rest of the abstracts I have
either formatted manually, or written a perl script to convert (where
the conversion looks like it will be useful in the future), with more
and more of a leaning towards writing scripts (who cares if I've got 40
scripts lying around, and never used again?  Besides, I'm a
Perlaholic :-).  For information on the format, see:

ftp://daneel.rdt.monash.edu.au/pub/techreports/sites/README

and on the whole project in:

ftp://daneel.rdt.monash.edu.au/pub/techreports/README

The database contains nearly 7000 reports, with over 2000 abstracts
from about 70 universities and research organisations.

My grand plan has been to have the entire database searchable,
returning a group of abstracts, with a hypertext link to the paper
itself, or if it's not available via ftp, a "mailto:" that will allow
the "contact" for that paper to be emailed a request for the paper.
This will all be done in W3, I've been converted :-)

I can see some common stuff here, so perhaps we could talk about
combining our efforts Marc (in private email, of course).

have fun,
rik.
--
Rik Harris - rik.harris@fcit.monash.edu.au
+61 3 560-3265 (AH & ans.mach)      +61 3 565-3227 (BH)
Faculty of Computing and Information Technology,
Clayton Campus, Monash University, Australia