CGP/1.0 specification

robm@ncsa.uiuc.edu (Rob McCool)
Message-id: <9311180039.AA06991@void.ncsa.uiuc.edu>
From: robm@ncsa.uiuc.edu (Rob McCool)
Date: Wed, 17 Nov 1993 18:39:28 -0600
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: www-talk@nxoc01.cern.ch
Subject: CGP/1.0 specification


I have spent some time writing up a formal specification for the
Server-gateway interface. I invite any and all comments and
suggestions. I would like the discussion to occur either on c.i.w3 or
www-talk. This document is available online for reference at

http://hoohoo.ncsa.uiuc.edu/beta-docs/external-protocol.txt


--------------------------------------------------------------------------




This is a preliminary specification of the Common Gateway Protocol, or
CGP. The version defined by this spec will be CGP/1.0. Once this
specification is discussed and (hopefully) agreed upon, all revisions
must be backward compatible.

Most of this is fairly specific to HTTP, since it will be implemented
under http servers. If anyone has other applications in mind, and
thinks these ideas would make those other applications difficult,
speak up.

*** Environment Variables

To pass most data from the server to the scripts, I propose we use
environment variables.  This way, we escape command line length
restrictions, and complications of short scripts having to parse stdin.

Defined environment variables, which are not request-specific:

SERVER_PROTOCOL:       HTTP, gopher, etc.
SERVER_SOFTWARE:       NCSA/1.0 or whatever
SERVER_NAME:           Server hostname or IP address
GATEWAY_PROTOCOL:      The revision of this spec to which the server complies

Request-specific variables:

SERVER_PORT:           The port answering this request
PROTOCOL_REV:          HTTP/1.0, HTTP/0.9, etc.
PROTOCOL_METHOD:       GET, PUT, POST, etc. for HTTP
FULL_URL:              The argument to the HTTP method untouched
                       (like /htbin/foo/extra/path/info?foo.bar
QUERY_STRING:          That which follows the ?, untouched
QUERY_DECODED:         The decoded string (see below)
PATH_INFO:             The extra path information, as given by client
PATH_TRANSLATED:       The extra path information, with any path
                       mapping done
REMOTE_HOST:           The client host making the request
REMOTE_USER:           The user the client has authenticated as
CONTENT_TYPE:          As given, applies to PUT and POST
CONTENT_LENGTH:        Length of content


The item which is difficult to pass as an env. variable is the decoded
query string. If we pass it as a single string separated by spaces, a
space in the middle of an item will screw it up. We could pass it as a
list of quoted items, but then there can't be quotes in the items. 

Alternatively, we could escape backslashes and spaces in such a list
with \.


*** The Command Line

On the command line, there should be two arguments:

argv[1] is always the path info, untouched. If there is no path info,
this is "".

argv[2....] is the decoded query info, split on spaces or ampersands.

Example: /htbin/script/extra/path?foo=bar&bar=foo

Command line:

script /extra/path foo=bar bar=foo

Example: /htbin/script?foo+bar+foo

Command line:

script "" foo bar foo

If the server does not use popen() or system(), and instead uses
fork() and execl(), the command-line length limitation should not
apply. I haven't verified this.


*** The script's STDIN

The server should pass the header for the request as given by the
client to the script as stdin. It should also, after the header, pass
the client's data stream.

Note that above, the server must know what content-type and
content-length are in order to put them in environment variables.
Should this be necessary, or should the server pass the entire header
without touching it, and make the script pull content-type and
content-length out?


*** The script's STDOUT

The script passes its output to the server through stdout. The output
consists of a header followed by a document. The header is parsed by
the server, and the following directives are valid:

Parse-header:

If the script returns a header line of "Parse-header: false", the
server will pass the rest of the output stream directly to the client.

Location:

If the argument to this is a URL, it is passed to the client as a 300
redirect.

If the argument to this is a path, the server will perform access
control checks on it and send the appropriate header and document to
the client. Should this be a virtual path, or a filesystem path? I
currently implement it as a filesystem path for flexibility.

Content-type:

This is the MIME type of what you're returning.


--Rob