WAIS and WWW patches
Nathan Torkington <Nathan.Torkington@vuw.ac.nz>
Date: Mon, 19 Jul 1993 21:12:54 +1200
From: Nathan Torkington <Nathan.Torkington@vuw.ac.nz>
Message-id: <199307190912.AA15968@kauri.vuw.ac.nz>
To: warnock@hypatia.gsfc.nasa.gov
Cc: www-bug@nxoc01.cern.ch, www-talk@nxoc01.cern.ch
Subject: WAIS and WWW patches
Status: RO
I've just finished some rough and ready code to implement the
following behaviour:
-- waisindex can cope with documents of type URL, in so far as it
sets the headline to be the URL of the document
-- HTWAIS.c in the CERN library knows about files of type HTML and
delivers them as such
-- HTWAIS.c in the CERN library knows about files of type URL, and
formats the results of a WAIS search accordingly.
This is pretty ugly behaviour (test it out on
http://www.vuw.ac.nz/home.html which is searchable -- searching it
searches a WAIS database using the patches I described previously) but
it works. In the future, a smart HTML-aware part of waisindex should
be written to suck out the <TITLE>...</TITLE> text and use that as the
headline, storing the URL in the DocID.
Anyway, add this to the irbuild.c file in the section where all the
various document types are defined
---begin
else if(0 == strcmp("URL", next_argument)) {
dataops.type = "URL";
typename = next_argument;
URL_trim = s_strdup(next_arg(&argc, &argv));
URL_prefix = s_strdup(next_arg(&argc, &argv));
}
---end
and add this to the help section:
---begin
fprintf(stderr," | URL what-to-trim what-to-add /* URL */\n");
---end
irtfiles.c now has in index_text_file
---begin
/* Make the current filename accessible via global variables.
* Increment current_filecount so routines can efficiently detect
* changes in the current file.
* -- Prentiss Riddle, Rice ONCS, riddle@rice.edu, 5/6/92
*/
if(current_filename == NULL) current_filename = s_malloc(MAX_FILENAME_LEN+1);
if (URL_prefix && !strncmp(filename, URL_trim, MIN(strlen(URL_trim), strlen(fi
lename)))) {
/* trim capable */
strcpy(current_filename, URL_prefix);
strcat(current_filename, filename+strlen(URL_trim));
} else
strncpy(current_filename, filename, MAX_FILENAME_LEN);
current_filecount++;
---end
and
---begin
/* we are processing a separator, therefore we should
* finish off the last document, and start a new one
*/
if(NULL != dataops->finish_header_function){
dataops->finish_header_function(header);
}
if(0 == strlen(header)){
char full_path[1000];
char directory[1000];
if (!URL_prefix) {
truename(filename, full_path);
sprintf(header, "%s %s", pathname_name(full_path),
pathname_directory(full_path, directory));
} else
strncpy(header, current_filename, MAX_FILENAME_LEN);
}
---end
ircfiles.c has at the end:
---begin
char *URL_prefix=NULL;
char *URL_trim=NULL;
---end
and ircfiles.h has at the end:
---begin
extern char *URL_prefix;
extern char *URL_trim;
---end
HTWAIS.c has in display_search_response:
---begin
} else { /* Not archie */
docname = WWW_from_WAIS(docid);
if (docname) {
char * dbname = HTEscape(database, URL_XPALPHAS);
sprintf(line, "%s/%s/%d/%s", /* W3 address */
dbname,
head->Types ? head->Types[0] : "TEXT",
(int)(head->DocumentLength),
docname);
HTStartAnchor(target, NULL, head->Types? (!strcmp(head->Types[0]
, "URL") ? headline : line) : line);
---end
Cheers;
Nat.