Binary Gateway Inferface - An API for extensible HTTP servers

Simon E Spero <ses@tipper.oit.unc.edu>
Errors-To: listmaster@www0.cern.ch
Date: Wed, 22 Jun 1994 21:35:11 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <9406221932.AA01950@tipper.oit.unc.edu>
Errors-To: listmaster@www0.cern.ch
Reply-To: ses@tipper.oit.unc.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Simon E Spero <ses@tipper.oit.unc.edu>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Binary Gateway Inferface - An API for extensible HTTP servers
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0"
Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0"
Mime-Version: 1.0
Mime-Version: 1.0
------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"

Folks-

  Here's a draft of a paper describing the Binary interface used in the
High Performance HTTP. Any comments gratefully recieved.

Simon
p.s.
official pronunciation is "Boogie" :-)


------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-Description: Binary Gateway Interface

DRAFT	DRAFT	DRAFT	DRAFT	DRAFT	DRAFT	DRAFT 	DRAFT	DRAFT 	DRAFT 

		Binary Gateway Interface -
	 An API for dynamically extensible HTTP servers

		     June 22nd 1994

			Simon Spero
	University of North Carolina at Chapel Hill
			ses@unc.edu			
		


Abstract:

	Many HTTP servers currently support an interface protocol allowing
	them to pass requests on external scripts. This protocol is known as
	CGI. This mechanism is extremely flexible, but is unsuited to 
	high performance applications. In this paper we discuss an alternative
	approach to server extensibility and propose an alternative interface
	protocol based on dynamically linked functions. We compare the two
	approaches and indicate some of the advantages and disadvantages of 
	each.


Introduction.
-------------
   The Common Gateway Interface (CGI)[McCool 93] is a standard way of
allowing the manager of an information  server to add extra functionality to a
server without needing to modify the http server itself. This functionality
is achieved by starting an external gateway process, and passing messages to
and from that process. CGI is not specific to the HTTP protocol.

    CGI communicates with the gateway process through a number of different 
mechanisms. Information about the request is passed through about 20 
environment variables. Information about queries is also passed via the 
command line. For requests that contain information in addition to the HTTP
header, the additional data will be made available on standard input. 

   The gateway script responds by sending the result to standard output. 
Normally the output is processed on to the client. For efficiency, if 
a script name begins with a magic string "nph-", the output is not parsed,
and may be send directly to the client.

  This system is extremely flexible; however the design is not suitable for
use in high performance servers. There are several reasons for this. The 
first problem is the processing overhead caused by the creation of an
extra process to handle each request. 

  Secondly, the server is required to process any and all HTTP headers,
and to generate an environment variable for each of them before
passing the request on to the gateway. Most of these headers will not
be needed by the gateway module. 

  Thirdly, unless the "nph-" escape hatch is used, the server must read and 
parse the results of the gatewayed operation before sending them on to the 
client. 


A Binary Gateway Interface 
--------------------------
   An alternative way of extending the functionality of a server is to make
use of the dynamic linking facilities available under most modern operating
systems. If a standard set of function calls for handling requests is 
defined, then extended operations can be handled as cheaply as standard ones.

Design Goals
------------
   The designed presented in the following section is intended to meet several
design goals.

	1) Fast. 	     Extensions should be able to run as fast as 
		   	     built in functions.

	2) Lazy.	     Headers should not be parsed or evaluated unless
			     absolutely necessary.

	3) Portable.         Gateways developed for one operating system should
			     be usable on another system without requiring
			     extensive modifications.
	
	4) Simple.	     The gateway author should not spend more time
			     working on the interface code than she does on
			     the actual gateway.

BGI design
-----------
   The design is somewhat inspired by the Plan 9 file system, and to a lesser
extent, the extension system used for the System V.4 name resolution library.

The BGI model is based on the model of a hierachical name space. Specialised
handlers can be mounted at any point in the name space; these handlers will 
be responsible for handling any requests that lie beneath their mount points,
unless a more specific handlers is mounted below it. 

Servers do not need to use this model internally; however BGI handlers do
need to be told where they are mounted so that they can determine how much
prefix to remove from a URL.

Example: Suppose we have a namespace with the following handlers
mounted at the indicated points.

Mount point			Handler
--------------------------------------------------
/				file_handler
/image-maps			map_handler
/pictures			picture_handler
/pictures/office-scene		videopix_handler
/cgibin				cgi_handler
/search-me			wais_handler

A request for "/pictures/simon.gif" would be handled by picture_handler, as
would a request for "/pictures/simon.jpeg". However, a request for 
"/pictures/office-scene" would invoke the videopix_handler.
However, asking for "/picture" would invoke the file_handler. 


   BGI handlers  are compiled object code modules containing three functions
which are used to mount and unmount handlers, and to handle incoming requests.

Handler Methods
---------------

Init

int <module>_init(char* mount_point)

This function is used to initialise a handler for attachment to a point in
the namespace. The value returned should either be 0, indicating that a problem
occured, or a cookie which will be passed to the handler function.

Unmount

int <module>_umount(char* mount_point, int cookie)

This function should remove the handler from the indicated mount point. 

Handler


int <module>_handler(int operation, int cookie, int socket, char* url,
		     char* header_buf, int buf_size)

This function handles all requests on this mount point. 

Arguments:

'operation' indicates the HTTP method that is being invoked. The only
currently defined values are OP_GET=1, and OP_POST=2. If other values are 
recieved, the function should signal an error as indicated below.

'cookiee' is the token returned by the initialisation function.

'socket' is the file descriptor for the current connection. 

'url' is the URL that is being processed. This url should have any leading
protocol specifiers removed before the handler is called.

'header_buf' contains a pointer to any data that may already have been read
from the connection before the handler was called. 

'buf_size' indicates the amount of valid data in header_buf.


Result code:

If no errors occur, the handler function should return 0; if an error does 
occur, the handler should either return -1, indicating that the server should
just close the connection, or a valid HTTP result code, indicating that the
server should generate an error message before closing the connection.

Notes:

All handler functions should be re-entrant.
Handler functions should not close the connection themselves.

Library functions
-----------------

Server implementors should make the following functions available to gateway
implementors.


---
int handle_url(int operation, int socket, char* url, char* buf, int size)

Used to handle redirections, so that a handler can simply compute an alternate
url and then have that resolved.

---
int http_error(int socket, int code)
Generate an error message corresponding to error 'code'

---

MORE NEEDED HERE

Comparisons
-----------


BGI offers a much faster alternative to CGI for extending servers; however 
there are several disadvantages. The most obvious problem is that BGI itself
uses compiled modules, whereas CGI programs can be written in interpreted 
languages. Since a CGI emulation module can be implemented under BGI, this
is problem can be circumvented.

Also, since BGI doesn't automatically handle all header processing,
if extensive header processing is needed, this must be handled by the 
application. Adding functions to support header manipulation to the support
library would certainly help this.

Open Issues
------------

1) It might be better to have separate handlers for each method, rather than
   having the single handler with its operation argument. This would allow 
   different handlers to manage GET and POST requests. However, this would
   complicate the interface, since most handlers would only support a single
   method.

2) Adding more functions to the support library will make implementing 
   gateways easier. I'm open to suggestions. 


References:

	[McCool 93] Introduction to CGI, http://hoohoo.ncsa.uiuc.edu/cgi/

------- =_aaaaaaaaaa0--