Re: CGI/1.0 status

George Phillips <phillips@cs.ubc.ca>
Date:  1 Dec 93 13:07 -0800
From: George Phillips <phillips@cs.ubc.ca>
To: <robm@ncsa.uiuc.edu>
Cc: <luotonen@ptsun00.cern.ch>, <www-talk@nxoc01.cern.ch>
In-reply-to: <9311290905.AA20602@void.ncsa.uiuc.edu>
Message-id: <6969*phillips@cs.ubc.ca>
Subject: Re: CGI/1.0 status
Rob says:
>I haven't recieved anything that I haven't addressed (to my knowledge, if
>something slipped by someone please say so). The doc at
>http://hoohoo.ncsa.uiuc.edu/beta-docs/external-protocol.txt is the latest.
>Note the addition of the naming convention "nph:scriptname" for scripts
>which require direct output to the client.

I'd wager that the spec is pretty stable.  If "nph:scriptname" is the
expected name of the script, then the ":" should be changed to
something else as it's problematic for DOS, OS/2, NT and possibly
Mac systems.  If, however, the "nph:" is stripped by server and
it execs "scriptname", all is fine.

>There's only one more touchy subject I want to discuss before we finalize,
>and that's the who-decodes-the-arguments issue.

I'm not sure what you mean by "decoding" in this context, but I assume
you're talking about parsing off the search string for an ISINDEX
or FORM request.  I also assume that you still want the server
parsing the headers like "Accept:".  If these assumptions are
correct and you promise to give me SCRIPT_NAME, SERVER_PORT and
SERVER_NAME, then it doesn't matter much to me, but let me
play devil's advocate.

>Here are the pro-arguments as I remember them, and my reasons for
>disagreeing:
>
>Argument: If the server does it, scripts don't have to do it, so there are
>          simpler scripts.
>
>Counter: However, a prudent script must have code to decode long arguments
>         anyway. Therefore, if the scripts may have to do it themselves
>         anyway, why bother decoding it in the first place, if the scripts 
>         need the code anyway?

Rebuttal: Simple scripts don't care if they break on long arguments.  They
	  just want to be quick hacks.  If someone tries to use a finger
	  gateway with too big an argument, they lose and that's OK.

>Argument: We already know how to decode the URL, there is ISINDEX and FORMs,
>          and we know how to decode both.
>
>Counter: FORMS are part of HTML+. What if there are other aspects of HTML+,
>         or HTML++ which are not compatible with these two methods? I don't
>         want to have people upgrading their server every time a new
>         convention is invented.

Rebuttal: New conventions will require a change somewhere.  Better it be
	  in the server rather than every script which wishes to stay
	  current.  What if HTML+++ provides a new way to give a search
	  string?  If the server understands that, the scripts will be
	  magically upgraded.  Also, partial decoding leaves some hope
	  of writing gateway scripts which can respond to entirely
	  different protocols like gopher or gopher+.  What with gopher
	  letting you spit back HTML, the gn server, and ASK blocks
	  being so similar to forms, the protocols could converge
	  at this CGI.

>My arguments for having the scripts do the decoding:
>
>1. It's painfully simple to do it even from a shell script, one line with a
>   C support program. PERL and C code is available to do so. What's the
>   advantage of having the server do it, besides avoiding a little confusion
>   for novice script writers?

Why have the scripts repeat common boilerplate?  The server's already
decoding lots of other stuff so why not throw in a little extra.

>2. Any script which needs to decode its own URL still has the server decode
>   it, possibly in a way the script doesn't want it to.  Wasted effort for
>   the server, CPU time which could be better spent servicing the ~130 other
>   waiting users (at least, if you're www.ncsa).

Well, if you're crying for CPU, better integrate those scripts into your
server since the exec + sh/perl parsing overhead is much greater.  The
small amount of CPU time in an extra decode is neglible.  If you're really
concerned, add a configuration option to the server that tells it
about any scripts which don't want stuff decoded.

>3. POST scripts which handle forms need the unescaping code regardless.
>   Again, duplication of effort.

Lots of scripts won't handle POST; they just want GET + HEAD.  Why
make every script work harder for just a few exceptions?

Anyhow, I think this is pretty much a tempest in a teapot.  Based on
my experience in writing gateway scripts, I like the extra bit of
decoding -- just seemed more convenient and never got in the way.
Mind you, that was only for <ISINDEX> decoding.  Maybe FORMs decoding
is less useful.  Maybe a split is in order:  decode ISINDEX stuff
'cause it won't overflow and is semantically simply, don't decode
FORMs stuff cause there's a good chance it'll be too big anyway
and it's hard to decode the information into a usable form.