Re: httpd scripts for WAIS queries.

Tony Sanders <sanders@bsdi.com>
Errors-To: sanders@bsdi.com
Errors-To: sanders@bsdi.com
Message-id: <9311120548.AA10216@austin.BSDI.COM>
To: George Phillips <phillips@cs.ubc.ca>
Cc: www-talk@nxoc01.cern.ch
Subject: Re: httpd scripts for WAIS queries. 
In-Reply-To: Your article in comp.infosystems.www  of Thu, 11 Nov 1993 19:59:41 PST.
Errors-To: sanders@bsdi.com
Reply-To: sanders@bsdi.com
Organization: Berkeley Software Design, Inc.
Date: Thu, 11 Nov 1993 23:48:35 -0600
From: Tony Sanders <sanders@bsdi.com>
In comp.infosystems.www I mused over the fact that we need a unified format
for all servers to pass data to external gateway code so all servers
(Plexus, NCSA, CERN at least) could share then (there is hope that Plexus
gateways will soon/eventually be easy to use with NCSA's /htbin stuff).

George Phillips asked in email (spuring me to complete my thoughts :-):
> Anyhow, do you have a list of information the main plexus code
> determines?  If so, maybe we can propose a option standard for
> external servers.

You need to deal with methods like POST that have contents in addition to
the HTTP/1.0 headers.  My suggestion is to pass most information to the
script via STDIN due to command line length limitations (alternatively we
could stuff it in a file and pass a pointer but that seems like a bad idea
for a number of reasons).  However, some things are best passed via
the command line for ease of use.

So here is a summary of the values you need access to (obviously, not
all scripts need all of these):
    Protocol		; http (gopher, etc)
    Method		; GET/PUT/POST/...
    URL			; split into various pieces include the query part
    Version		; HTTP/0.9 or HTTP/1.0
    Port		; port number this request came in (for multiport)
    Peeraddr		; ip address/hostname of requesting host
    Authuser		; special case of request header, may be anonymous
                        ; this is the username of the authenticate user
    Content-type	; another special case of request header

    Request Headers	; probably passed via stdin
    Content		; if any -- probably passed via stdin

So we could go with:

    STDIN:
        Request headers followed by blank line (the standard mail header)
        Content
    Command line:
        command -p protocol -m method -u url/path -v version -i port
                -a peeraddr -n username -c content-type

Another issue is who outputs what information (e.g., who writes the HTTP/1.0
header).  I vote that the gateway should be responsible for all output.
The reason for this is that it is then very easy for the server to step
in the middle and perform special processing (like format conversion),
without any special cases needed in the gateway code.

Further down the road perhaps we need to "type" the external code so
we can have internal conversions to various types/versions of them.
For example, we already have the /htbin standard, so the server could
continue to support existing /htbin scripts.  The above could become
/htbin_plus or something.  This way we can evolve the standards as
requirements change and not break anything along the way.

Comments/suggestions welcome.

This document is online:
    http://www.bsdi.com/HTTP:TNG/ext-gateway.txt

--sanders