Announcement: Indexing extension

neuss@igd.fhg.de

Mail folder: WWW Talk Oct 93-present
Next message: Dave_Raggett: "X Mosaic 2.0 and Closed to Open subnet gateway"
Previous message: Kevin 'Kev' Hughes: "WAIS indexing directory hierarchies"

From: neuss@igd.fhg.de
X-Mailer-Igd: ## IGD.FHG.DE ## Fri, 26 Nov 93 12:45:17 +0100
Date: Fri, 26 Nov 93 12:44:25 +0100
Message-id: <9311261144.AA01606@wildturkey.igd.fhg.de>
To: www-talk@nxoc01.cern.ch
Subject: Announcement: Indexing extension

Dear fellow webbers,

we finally have finished a first version of our indexing extension
to HTTP servers. Feedback is very welcome.

Here's some info on it:
Chris
--
/*
 *  Christian Neuss  %  neuss@igd.fhg.de  %  ..in the humdrum
 */
================================ SNIP ==================================

Fraunhofer IGD proudly presents:

HTTP Index Server Extension
===========================

 Many thanks to Ari Luotonen from CERN for his contribution of
 the extract-title command, and to Stefanie Hoefling for many hours 

 of debugging. -- Chris

What it is:
The HTTP Index Server Extension allows for doing free text queries
on hierarchies of HTML files. The functionality is pretty close
to a WAISINDEX interface, but the package is a lot smaller and more
portable. What it basically does is have cron create an index file
in regular intervalls, and access this index file whenever an index
query from the client is being issued. The supported query syntax
allows combining keywords with AND and OR, so a query could look
like "server and script". As result of the index query, list of all 

HTML files containing both words will be created and sent back to the
client. Files will contain a relevance feedback, and are clickable
hyperlinks to the files themselves.

Another feature is the ability to use a thesaurus for conceptual
searches: Entering "{picture}" as query will not only retrieve
files containing the word "picture", but also related concepts
like "image" etc. The thesaurus format we support is the ANSI
standard Thsaurus Image Format (TIF). Thesaurus information is 

available from many sources, but the most important feature is
probably the ability to create specific technical thesauri
related to whatever is stored in your HTML text database.

How to get it:
Access it from
  ftp://ftp.igd.fhg.de/incoming/ICE-1.01a.tar.Z
  ftp://info.cern.ch/pub/www/src/ICE-1.0b.tar.Z
in order of preference. The version on the CERN server is slightly
older, but I'll send them an update, and they will probably soon
put up the 1.01a version.

Bugs:
Probably too numerous to mention :-) 

This is a very early version, and will be improved in the future.
The index extension will probably become part of the CERN httpd
server, and perhaps Rob McCool will also include it in the NCSA
distribution. The version I make available is mostly for those 

of you who need indexing badly, and don't want to wait for
future server releases. 


Please contact me if you have bug-reports or suggestions: 

Christian Neuss  

c/o Fraunhofer IGD  

Wilhelminenstr. 7  

64283 Darmstadt  

GERMANY
Fax: (+49)6151 155-199  

email:  neuss@igd.fhg.de

Have fun :-),
-- Chris