CGI/1.0 --- what's wrong with the status quo?

rst@ai.mit.edu (Robert S. Thau)
From: rst@ai.mit.edu (Robert S. Thau)
Date: Tue, 28 Dec 93 12:02:23 EST
Message-id: <9312281702.AA03437@volterra>
To: www-talk@nxoc01.cern.ch
Subject: CGI/1.0 --- what's wrong with the status quo?
Content-Length: 3717
As the de facto webmaster of a site (the MIT AI lab) which recently
upgraded to a somewhat modified (I'll get to that) NCSA httpd 1.0, in the
hopefully correct impression that server features, particularly the script
interface, were finally stable, I'm watching the discussion over
modifications with interest.  Here's a different perspective:

Franks, as far as I can tell, is objecting that with CGI as presently
implemented by (at least) NCSA, you can't tell whether a particular URL
will cause a script to be invoked or not, nor can you tell where the name
of the script ends and the parameters begin.  

For instance, consider an Info gateway.  As things stands, you can't tell
whether something like

  http://some-site.edu/info/rel/perl.info/Formats

will search a directory structure whose files contain translated Info nodes
or whether it will run a script to do the translation on the fly --- nor
whether the 'rel' is a parameter to the translator, or whether it selects
which of several alternative scripts to run.

My question is, what's wrong with this?  It doesn't confuse me --- I know
that 'info' is the script, 'rel' is a parameter, and the rest is info file/
node name --- that's the way I chose to set it up.  And as for clients, I
would tend to view these alternatives as implementation details which are
none of their business.  (Really picky observers may note that these aren't
quite the same as the URLs used by the info gateway I'm actually running
--- in particular, for back compatibility with an older hack, I'm covering
up the 'rel' parameter with a ScriptAlias, but this is a reflection of
what's actually going on under my hood).

In fact, I've found the status quo to be in some respects insufficiently
flexible.  For instance, it's awkward to have to put Guy Brooker's archie
script in a different directory from its coversheet, at potentially far
remove.  To deal with this, I've modified my NCSA httpd so that it is
capable of running scripts from (some of) the same directories it would
ordinarily search for files, under control of a RunScripts allow-option.
(The scripts are distinguished from ordinary files by a naming convention
which isn't visible to the clients, and PATH_INFO works --- as indicated
above, I'm using it.  BTW, I'd be willing to give the changes out as a
patch to anyone interested, and willing not to look a gift horse too close
in the mouth).

With this all in mind, my comments on the two changes which seem to be on
the table:

1) Having a magic character which delimits CGI script parameters ---

   I could live with this, although as I say, I really don't think it's
   much of the client's business.  However, it would require modifying
   every script out there which takes PATH_INFO --- and every invocation of
   one.  (That means every use of imagemap, among many others).

   BTW, with regard to the specific point that the status quo requires the
   daemon to do 'wasted' stats to discover where the script name ends, it's
   worth remembering that the daemon may be doing a lot of stats anyway for
   other purposes --- the NCSA daemon, for instance, walks the directory
   hierarchy repeatedly during access checks, looking for .htaccess files
   and symlinks.  In any case, compared with the load of running a Bourne
   shell script --- forking and execing a process which is likely to fork
   and exec many more --- these stats are pretty trivial.

2) As an alternative, requiring a fixed string at the name of any URL which
   might invoke a script ---

   This would set in stone the notion of separate, parallel directory
   hierarchies for scripts and everything else.  As indicated above, I
   don't like that notion much at all.

rst