Re: CGP/1.0 specification

robm@ncsa.uiuc.edu (Rob McCool)
Message-id: <9311180907.AA11935@void.ncsa.uiuc.edu>
From: robm@ncsa.uiuc.edu (Rob McCool)
Date: Thu, 18 Nov 1993 03:07:04 -0600
In-Reply-To: Tony Sanders <sanders@bsdi.com>
       "Re: CGP/1.0 specification" (Nov 17, 10:32pm)
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: sanders@bsdi.com, www-talk@nxoc01.cern.ch
Subject: Re: CGP/1.0 specification
/*
 * Re: CGP/1.0 specification  by Tony Sanders (sanders@bsdi.com)
 *    written on Nov 17, 10:32pm.
 *
 * 
 * > REMOTE_USER:           The user the client has authenticated as
 * State that if unset or null then it's an unauthenticated request

Done.

 * > The item which is difficult to pass as an env. variable is the decoded
 * > query string. If we pass it as a single string separated by spaces, a
 * I think argv[2...] is enough, I don't think we need an env variable also.

I think you're right, it would be a real pain to put the data in an env.
variable also.

 * I think what we should state is that if the resulting command line would
 * excede system maximuns (length or # arguments) then *NO* argv[2] is
 * passed and you must decode the URL yourself or punt.  This is also
 * the case if there is no query string (then you check $SERVER_QUERY
 * or whatever it's called and if it's not set there is no query at all).

Are the maximums a per-system thing, or should we specify what the maximum
length is?

 * > On the command line, there should be two arguments:
 * I would restate that, even if you don't adopt the suggestion above
 * it's not true.

You're right, it's something I wrote and later forgot to edit out.

 * > argv[1] is always the path info, untouched. If there is no path info,
 * > this is "".
 * > 
 * > argv[2....] is the decoded query info, split on spaces or ampersands.
 * You can't split on spaces, you mean pluses `+'.  How do you decide
 * if it's spaces or ampersands?  I assume if it has &'s then you split
 * on & else +, just need to make that clear.

Yes, I meant pluses. We have to trust the client to encode ampersands if
it's an ISINDEX query, and trust it to encode pluses if it's a form query.
Mosaic is good about it, are there any that aren't?

 * > If the server does not use popen() or system(), and instead uses
 * > fork() and execl(), the command-line length limitation should not
 * > apply. I haven't verified this.
 * It does, it's a low-level problem with exec.

Oh well. I was hoping.

 * > The server should pass the header for the request as given by the
 * > client to the script as stdin. It should also, after the header, pass
 * > the client's data stream.
 * I assume this includes the ``GET /foobar HTTP/1.0'' part.

Should we pass that part? The scripts can get the info from the other data
if I'm not mistaken...

 * > Note that above, the server must know what content-type and
 * > content-length are in order to put them in environment variables.
 * > Should this be necessary, or should the server pass the entire header
 * > without touching it, and make the script pull content-type and
 * > content-length out?
 * I think the server must always touch the data so this isn't a problem,
 * the server will always have to relay the information.  I don't see a clean
 * way around this because there is no way to read "just" the right amount
 * of information so you don't snarf in some of the headers or the data.
 * That would require length information *before* you read the request.
 * A smarter HTTP protocol would have a fixed size data header with this
 * information, but without it you can't do what you imply.

Hmmm. I suppose the server will have to buffer it and spit it back for the
script. The thing which bothers me about this is that the header sometimes
becomes quite lengthy, with all of the Accept: header lines.

 * > *** The script's STDOUT
 * 
 * I need to look at all the plexus gateways and see what "callbacks" they
 * will require (the two you listed are the big ones).
 */

Let me know if there are more.

--Rob