WAIS and WWW patches

Nathan Torkington <Nathan.Torkington@vuw.ac.nz>
Date: Mon, 19 Jul 1993 21:12:54 +1200
From: Nathan Torkington <Nathan.Torkington@vuw.ac.nz>
Message-id: <199307190912.AA15968@kauri.vuw.ac.nz>
To: warnock@hypatia.gsfc.nasa.gov
Cc: www-bug@nxoc01.cern.ch, www-talk@nxoc01.cern.ch
Subject: WAIS and WWW patches
Status: RO
I've just finished some rough and ready code to implement the
following behaviour:
 -- waisindex can cope with documents of type URL, in so far as it
    sets the headline to be the URL of the document
 -- HTWAIS.c in the CERN library knows about files of type HTML and
    delivers them as such
 -- HTWAIS.c in the CERN library knows about files of type URL, and
    formats the results of a WAIS search accordingly.

This is pretty ugly behaviour (test it out on
http://www.vuw.ac.nz/home.html which is searchable -- searching it
searches a WAIS database using the patches I described previously) but
it works.  In the future, a smart HTML-aware part of waisindex should
be written to suck out the <TITLE>...</TITLE> text and use that as the
headline, storing the URL in the DocID.

Anyway, add this to the irbuild.c file in the section where all the
various document types are defined

      else if(0 == strcmp("URL", next_argument)) {
        dataops.type = "URL";
        typename = next_argument;
        URL_trim = s_strdup(next_arg(&argc, &argv));
        URL_prefix = s_strdup(next_arg(&argc, &argv));
and add this to the help section:

  fprintf(stderr,"           | URL what-to-trim what-to-add /* URL */\n");

irtfiles.c now has in index_text_file

  /* Make the current filename accessible via global variables.
   * Increment current_filecount so routines can efficiently detect
   * changes in the current file.
   * -- Prentiss Riddle, Rice ONCS, riddle@rice.edu, 5/6/92
  if(current_filename == NULL) current_filename = s_malloc(MAX_FILENAME_LEN+1);

  if (URL_prefix && !strncmp(filename, URL_trim, MIN(strlen(URL_trim), strlen(fi
lename)))) {
    /* trim capable */
    strcpy(current_filename, URL_prefix);
    strcat(current_filename, filename+strlen(URL_trim));
  } else
    strncpy(current_filename, filename, MAX_FILENAME_LEN);



      /* we are processing a separator, therefore we should
       * finish off the last document, and start a new one
      if(NULL != dataops->finish_header_function){
      if(0 == strlen(header)){
        char full_path[1000];
        char directory[1000];
        if (!URL_prefix) {
          truename(filename, full_path);
          sprintf(header, "%s   %s", pathname_name(full_path),
                  pathname_directory(full_path, directory));
        } else
          strncpy(header, current_filename, MAX_FILENAME_LEN);

ircfiles.c has at the end:

char *URL_prefix=NULL;
char *URL_trim=NULL;

and ircfiles.h has at the end:

extern char *URL_prefix;
extern char *URL_trim;

HTWAIS.c has in display_search_response:

        } else { /* Not archie */
            docname =  WWW_from_WAIS(docid);
            if (docname) {
                char * dbname = HTEscape(database, URL_XPALPHAS);
                sprintf(line, "%s/%s/%d/%s",            /* W3 address */
                    head->Types ? head->Types[0] : "TEXT",
                HTStartAnchor(target, NULL, head->Types? (!strcmp(head->Types[0]
, "URL") ? headline : line) : line);