Re: Getting searching to work

Tim Berners-Lee <timbl@www3.cern.ch>
Date: Thu, 28 Jan 93 08:14:39 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9301280714.AA03115@www3.cern.ch>
To: pflynn@curia.ucc.ie (Peter Flynn)
Subject: Re: Getting searching to work
Cc: www-talk@nxoc01.cern.ch
Reply-To: timbl@nxoc01.cern.ch

>  Date: Wed, 27 Jan 93 16:33:38 GMT
>  From: pflynn@curia.ucc.ie (Peter Flynn)
>  

>  How is it best to add a simple search facility to a httpd server?
>  Or, better put, if a user wends hir way into one of my html menus,
>  is there a simple addition I can make that will add a search  
capability?

Not as simple as you'd like, I suspect.  The httpd daemon doesn't  
support a search directly.  What you can do is (hack httpd or) run  
another daemon which is completely written in sh or csh or perl (pick  
your favorite).  There are some examples on the web. Then you have
something like

  <dt><a name=something
 href="http://curia.ucc.ie:9000/keywordsearchsearch/joyce">
 Search<dd>the above texts for a name or keyword

When the user follows the link, the special server  on
port 9000 gets a
	GET /keywordsearchsearch/joyce
request, and returns a search panel document:

  <head>
    <isindex>
  </head>
  <body>
  Give keywords, or words from the title, to find books
  by James Joyce which match all keywords given.
  </body>

The <isindex> flag tells the www program that the document is a  
search panel and enables the FIND command. (On smarter browers it
enables the search text input field.)

When the user types a keyword, that same special server gets a
different request:

	GET /keywordsearch/joyce?portrait+young

Your script reads that from stdin and must write the result back to
stdout. Like

	#! /bin/csh
	request = ( `echo "$<" ) 

	'echo request[2] | sed -f request.sed ' | sed -f reply.sed
	
where request.sed is something like 

	s|^\([^/]*\)/\([^?]*\)?\(.*\)| pat -\1 -cat \2 \3|g
	s|+| |g

(I no nothing of pat, so that is all made up. Notice I used parts of  
the address of the serach panel to specify options to pat)  The  
output is formatted into a hypertext file, in the example by  
reply.sed
which has to generate a hypertext document with a valid reference
to the found documents with their addresses on your original
httpd server.

Which all is in fact simpler than it looks -- largely because the
thibng is just a program runnng from stdin to stdout which you
can test on a terminal.  When you run it under inetd (just like
httpd is run, but on port 9000) it is inetd which takes care of
attaching stdin and stdout to the client.

As pat sounds like a serious peice of retrieval machinery, it
would certainly be worth wrapping it up as a W3 server to make it
available on the web.

A couple of hints:  1. Put lots of parameters into the address
of the serach panels so that you can put pointers to all kinds
of different pat features if you needs them
2. In the search panel document which your port 9000 server script
generates, put a pointer to related serach panels, help pages etc.

>  What I'd like is something like:
>  

>  <dl>
>  <dt><a name=dub href="dubliners.html">Dubliners<dd>by James Joyce
>  <dt><a name=ulysses href="ulysses.html">Ulysses<dd>by James Joyce
>  [etc]
>  <dt><a name=something href=somepointer>Search<dd>the above texts  
for
>  a name or keyword
>  </dl>

This is basically what I have described.  When the guy follws the  
link, he gets back a micro-document which tells him about the search
he can do.  This is the WWW model. The Gopher model, which Dan  
Connolly prefers, is that he should immediately get prompted for
keywords with a default search panel. I'll discuss that in a separate  
message.

Tim