john@math.nwu.edu (John Franks)
From: john@math.nwu.edu (John Franks)
Message-id: <9311012331.AA07908@hopf.math.nwu.edu>
Subject: Wais2html
To: www-talk@nxoc01.cern.ch
Date: Mon, 1 Nov 1993 17:31:40 -0600 (CST)
X-Mailer: ELM [version 2.4 PL23]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3488      

I've been working on WAIS index support for the gn gopher/http server.
As a by-product I wrote a short program which may be of use to people
using other servers to allow them to use WAIS indexes.  It's nothing
fancy and hasn't been well tested but might be a start for someone
wanting to implement WAIS indexing.

Available from  ftp://ftp.acns.nwu.edu/pub/gn/wais2html.tar.Z

Here's the README:

Wais2html is a small C program which is linked WAIS libraries.  It
is run with arguments including a WAIS index name and search terms
and it produces on stdout an html document containing a list of 
URL's to the files which contain a match for the search terms.  The
intent is that this can be used with an http server to provide <INDEX>
html documents which will search a collection of files indexed with
WAIS.  It only works with full text indexes of a collection of files,
i.e. there is no support for things like mail or news format where
a single message, as opposed to a whole file is considered a document.
This would be easy to add, but requires cooperation from the server
to return only part of a file.  

WARNING: This is a quick effort with almost no testing done.  There
is no support -- you're on your own.

Here is what you need to do.

1. Get the WAIS software.  You can use either freeWAIS from
and build WAIS per the instructions.

2. In the wais2html src directory make symbolic links to the directories
"bin" and "ir" in the main WAIS source directory.

3. Run make in the wais2html directory, producing the wais2html binary.

4. Index your files.  This is done with the program "waisindex" which
in the bin subdirectory of the main WAIS source directory.  I suggest
doing this by making a directory, say "waisindex" in which the index
files will reside. Then cd to that directory and use the command
	waisindex -t filename /complete/path/to/files...
	waisindex -t first_line /complete/path/to/files...
where "files..." is replaced by a list of all the files you want to
index.  The difference in these two commands is that the html document
which wais2html will produce will refer to the matching documents
either by the name of the file containing a match or by the the first
line of the contents of the file containing a match.  Note that in
the first form the argument is literally the string "filename"; that
string is not replaced with the name of a file.

[This step will be different if your server will run chrooted, because
the complete path of the files are embedded in the index.  I haven't
tried it, but it should work to do the indexing chrooted,
i.e. /etc/chroot newroot waisindex]

5. Test your set up with this index by running the command 
	wais2html index root_dir title host port words...
where root_dir is the root relative to which your server calculates
URLs, host is your host name, index is "/path/to/waisindex/index",
title is a quoted string for the title of the html document and
words...  is a list of search terms.  Note that the "index" argument
is slightly strange.  It is the path to the waisindex directory you
created with "/index" tacked on.  There is no file by this name but
a bunch of files of the form index.*.

6. If all works well set up your server to handle an INDEX query by
running the program wais2html with arguments as in step 5 and so it
returns the document this program produces on stdout.