Re: CGI, semicolons, and so on...

rst@ai.mit.edu (Robert S. Thau)
From: rst@ai.mit.edu (Robert S. Thau)
Date: Wed, 29 Dec 93 18:59:14 EST
Message-id: <9312292359.AA03913@volterra>
To: john@math.nwu.edu, rst@ai.mit.edu
Subject: Re: CGI, semicolons, and so on...
Cc: www-talk@nxoc01.cern.ch
Content-Length: 4732
John Franks' note is very helpful in making the disagreements between us
clear --- and, unfortunately, making it clear that a resolution is rather
less than likely, since the disagreements are largely over matters of
style.  For instance, if I were to come up with a list of desiderata for a
CGI-like interface, it would probably have somewhere on it, that:

n) There should not be any indication within the URL, selector string,
   etc., as to whether or not a retrieval will cause a script to be
   invoked.

Unfortunately, as near as I can tell, there is no way to reconcile this
desideratum with John's.

The reason I want it is that having put something up as a file, or
collection of files, I may want to turn it into a script, without having to
track down all the references to it and change them as well --- which I
would necessarily have to do if the script/file distinction were explicit
in the URL (selector, whatever).  The 'eyes-only document hack' provides, I
think, one fairly reasonable example of where one might like to do this.

Conversely, I might also want to take something that I've put up as a
script, and replace it with a collection of files --- as in the case of
replacing my on-the-fly info gateway with the output of a batch translator.
I want to be able to do this without having to find and change every
reference to an info node anywhere on my server.

Now, in order to satisfy the goal above, you need some way of
distinguishing the scripts from the ordinary files, other than selector
syntax.  In other words, you need a mechanism for typing the files.

Like it or not, most of the existing servers already have such a mechanism
--- to tell what type a file is (in order to report the proper MIME type),
they discriminate on the basis of the name.  If you put a GIF file up under
the name 'foo.au', or even just plain 'foo', then (with the stock NCSA
server, and Plexus and CERN as well, I believe), the wrong MIME type will
be reported back to the client, and things fall apart.  If you want to get
rid of arbitrary naming conventions, this may be the best place to start.
Scripts could easily piggy-back on any solution to the file-typing problem
--- just type them as application/x-cgi-run-it-here, and make the server
give special treatment to that type (as the NCSA server already does for
text/html with Charles' server includes).

But, supposing the script type is different enough from other types that we
want some completely different mechanism for indicating files of that type.
For instance, we could use the 'x' bit.  Now, 'x' bits sometimes do get set
where they aren't meant to be --- typos happen.  Generally, this doesn't
matter much.  A C source file with an 'x' bit is still a perfectly useful C
source file.  However, with this scheme, an 'x' bit on a data file in the
server arena makes it impossible to retrieve the file.

For instance, suppose some novice slips up with chmod, and the 'x' bit gets
set on a form coversheet --- his fingers ran away with him, and instead of
'chmod foo', he typed 'chmod foo*'.  Then, when the luser attempts to
retrieve the coversheet, the server sees the 'x' bit on it, and tries to
run it as code.  When this fails, the poor guy sees:

  500 Server error.

  The script '...' failed to produce output...

From the server's perspective, there's no more to say --- it was supposed
to run the thing, and it couldn't.  But the novice will likely assume that

  a) it was trying to run the script, and not the coversheet.
     (After all, it makes no sense to try to run the coversheet).

  b) It tried and failed to run the script.  Therfore, the script is
     somehow broken, even though it seems to work just fine
     when tested from the command line.

This is the setup for a major wild goose chase.

Better error messages might ameliorate the problem --- but it would be
better if it weren't so easy to make this error in the first place.  It's
easy to slip up with chmod without knowing it, and when that makes things
break (as when 'chmod * 664' snags a directory by mistake), it can be
pretty tricky to figure out what's gone wrong.  ('ls' works, but nothing
else does).  But accidentally renaming foo.html to foo.doit --- that's
tough.

In fact, suffix-based naming conventions are all over Unix, and many other
widely used operating systems, in part because on many of these systems, a
file's name is one of the few unique attributes the thing has got.  I don't
think it's "inelegant" that the C compiler treats files whose names end in
'.c' different from those whose names end in '.o', and I've never found it
to be bothersome.  Perhaps it's a matter of taste --- but like I said,
arguments over those will never be resolved.

rst