Re: CGI/1.0 status

robm@ncsa.uiuc.edu (Rob McCool)

Mail folder: WWW Talk Oct 93-present
Next message: Axel Belinfante: "Re: Submit/Reset Button names on Forms "
Previous message: joe@peacock.tnjc.edu.tw: ".flc on Mosaic"
In-reply-to: George Phillips: "Re: CGI/1.0 status"
Reply: Ari Luotonen: "Re: CGI/1.0 status"

Message-id: <9312020731.AA22706@void.ncsa.uiuc.edu>
From: robm@ncsa.uiuc.edu (Rob McCool)
Date: Thu, 2 Dec 1993 01:31:25 -0600
In-Reply-To: George Phillips <phillips@cs.ubc.ca>
       "Re: CGI/1.0 status" (Dec  1,  1:07pm)
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: George Phillips <phillips@cs.ubc.ca>
Subject: Re: CGI/1.0 status
Cc: <luotonen@ptsun00.cern.ch>, <www-talk@nxoc01.cern.ch>

/*
 * Re: CGI/1.0 status  by George Phillips (phillips@cs.ubc.ca)
 *    written on Dec  1,  1:07pm.
 *
 * I'd wager that the spec is pretty stable.  If "nph:scriptname" is the
 * expected name of the script, then the ":" should be changed to
 * something else as it's problematic for DOS, OS/2, NT and possibly
 * Mac systems.  If, however, the "nph:" is stripped by server and
 * it execs "scriptname", all is fine.

Good point. Mac systems will not like the : one bit. I'll change it to
"nph-", are there any objections to that? VMS? I think DOS, OS/2, NT,
and System 7 don't have any problem with it. I'm not sure about VMS.

 * I'm not sure what you mean by "decoding" in this context, but I assume
 * you're talking about parsing off the search string for an ISINDEX
 * or FORM request.  I also assume that you still want the server
 * parsing the headers like "Accept:".  If these assumptions are
 * correct and you promise to give me SCRIPT_NAME, SERVER_PORT and
 * SERVER_NAME, then it doesn't matter much to me, but let me
 * play devil's advocate.

Yes, you get those env. vars. The wording is unclear in the section
regarding the script's stdin, so I will change it. I would rather have the
server parsing the headers since sending them all intact would get messy and
complicate scripts.

 * >Here are the pro-arguments as I remember them, and my reasons for
 * >disagreeing:
 * >
 * >Argument: If the server does it, scripts don't have to do it, so there are
 * >          simpler scripts.
 * >
 * >Counter: However, a prudent script must have code to decode long arguments
 * >         anyway. Therefore, if the scripts may have to do it themselves
 * >         anyway, why bother decoding it in the first place, if the scripts 
 * >         need the code anyway?
 * 
 * Rebuttal: Simple scripts don't care if they break on long arguments.  They
 * 	  just want to be quick hacks.  If someone tries to use a finger
 * 	  gateway with too big an argument, they lose and that's OK.

A good point.

 * >Argument: We already know how to decode the URL, there is ISINDEX and 
 * >          FORMs, and we know how to decode both.
 * >
 * >Counter: FORMS are part of HTML+. What if there are other aspects of HTML+,
 * >         or HTML++ which are not compatible with these two methods? I don't
 * >         want to have people upgrading their server every time a new
 * >         convention is invented.
 * 
 * Rebuttal: New conventions will require a change somewhere.  Better it be
 * 	  in the server rather than every script which wishes to stay
 * 	  current.  What if HTML+++ provides a new way to give a search
 * 	  string?  If the server understands that, the scripts will be
 * 	  magically upgraded.  Also, partial decoding leaves some hope
 * 	  of writing gateway scripts which can respond to entirely
 * 	  different protocols like gopher or gopher+.  What with gopher
 * 	  letting you spit back HTML, the gn server, and ASK blocks
 * 	  being so similar to forms, the protocols could converge
 * 	  at this CGI.

I would hope HTML+* will keep the old methods around for backward
compatibility. However, your point regarding gopher is very well taken.
Perhaps I should have John Franks look over the spec and offer his comments.

 * >My arguments for having the scripts do the decoding:
 * >
 * >1. It's painfully simple to do it even from a shell script, one line with a
 * >   C support program. PERL and C code is available to do so. What's the
 * >   advantage of having the server do it, besides avoiding a little confusion
 * >   for novice script writers?
 * 
 * Why have the scripts repeat common boilerplate?  The server's already
 * decoding lots of other stuff so why not throw in a little extra.

For ISINDEX scripts, I would probably have to agree. However, for forms, I
don't know...

 * >2. Any script which needs to decode its own URL still has the server decode
 * >   it, possibly in a way the script doesn't want it to.  Wasted effort for
 * >   the server, CPU time which could be better spent servicing the ~130 other
 * >   waiting users (at least, if you're www.ncsa).
 * 
 * Well, if you're crying for CPU, better integrate those scripts into your
 * server since the exec + sh/perl parsing overhead is much greater.  The
 * small amount of CPU time in an extra decode is neglible.  If you're really
 * concerned, add a configuration option to the server that tells it
 * about any scripts which don't want stuff decoded.

I shy away from adding configuration options since they're server-specific.

While the CPU time is neglible, it is impossible to integrate scripts into
the server unless you're Plexus. Therefore, I'd like to save as much time as
possible, since our server (and probably info.cern.ch) could use all of the
extra CPU time they can get.

 * >3. POST scripts which handle forms need the unescaping code regardless.
 * >   Again, duplication of effort.
 * 
 * Lots of scripts won't handle POST; they just want GET + HEAD.  Why
 * make every script work harder for just a few exceptions?

Most ISINDEX scripts only handle GET and HEAD. Most FORMs scripts will
eventually have to use POST when they grow large enough to cross GET's
boundaries.

 * Anyhow, I think this is pretty much a tempest in a teapot.  Based on
 * my experience in writing gateway scripts, I like the extra bit of
 * decoding -- just seemed more convenient and never got in the way.
 * Mind you, that was only for <ISINDEX> decoding.  Maybe FORMs decoding
 * is less useful.  Maybe a split is in order:  decode ISINDEX stuff
 * 'cause it won't overflow and is semantically simply, don't decode
 * FORMs stuff cause there's a good chance it'll be too big anyway
 * and it's hard to decode the information into a usable form.
 */

This, I think, is a good idea. I would agree to making ISINDEX queries
decoded, but not FORMs queries (at least not with GET).

Ari? Tony? Everyone else?

--Rob