CGI suggestion

john@math.nwu.edu (John Franks)
From: john@math.nwu.edu (John Franks)
Message-id: <9312271649.AA03002@hopf.math.nwu.edu>
Subject: CGI suggestion
To: www-talk@nxoc01.cern.ch
Date: Mon, 27 Dec 1993 10:49:17 -0600 (CST)
X-Mailer: ELM [version 2.4 PL23]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3475      

Now that I am seriously looking at implementing the CGI interface,
I find one part problematic.  This is the way that "state information"
or arguments to a script get encoded in a URL as a sort of pseudo-path
at the end.

Here are my objections:

1. It is not possible to fully parse the URL without knowledge of the
server's file hierarchy.  For example, without knowing something about
the file structure of the server I can't tell whether 

http://host.edu/foo1/foo2/foo3

means script /foo1/foo2 with parameter foo3 or script /foo1 with
parameter /foo2/foo3.  I am not sure that there won't at some point be
a need to get this information.  Maybe not, but in any case this syntax is
cumbersome to implement.

2. Assuming in the example above that the parameter is foo3 (or /foo3 ?)
then the URL actually refers to two files: root/foo1/foo2 and, say,
root/u/Web/foo3.  Inexperienced users will find this confusing and 
expect to find an actual file root/foo1/foo2/foo3.

3. This syntax overloads the '/' token so it has very different meanings
depending on context and does this in a situation where the context 
isn't readily visible.  In my experience this is conducive to errors.


SUGGESTION:

I would like to make it a CGI *requirement* that the PATH_INFO data
at the end of a URL contain an '=' and that this '=' be before the 
occurence of any '/' in this data.  

Here is what the example above might be like:

	/foo1/foo2/path=foo3

Other legal and useful URL's might end like

	/foo1/foo2/param1=value1&param2=value2

	/foo1/foo2/path=foo3/foo4&path2=foo5

URL's like this existing one from the xerox parc map server would be
perfectly legal.

	http://pubweb.parc.xerox.com/map/color=1/ht=30/lat=38.8/lon=-96

But I would encourage map/color=1&ht=30 etc. instead of using '/' as
the separator.  The main reason is that code to parse the '&' version
should be common since it is necessary for forms.

If the server knows that an '=' will occur at the begining of the
PATH_INFO data, (and that any ='s in the actual path are URL encoded)
then this information can be used to parse the URL without knowledge
of the server filesystem.  Also it is quite clear that expressions like
foo1/foo2/path=foo3 refer to two files not one.

The only significant change in the current CGI implementations that
this would require is the PATH_TRANSLATED environment variable.  I
would suggest that this be replaced by a variable containing a
directory name and then the script could create the translated path.
For example if the URL ended in

	/foo1/foo2/file1=foo3&file2=foo4/foo5

then the script could read the environment variable to get the directory,
say, "/u/Web" and could reconstruct the file names /u/Web/foo3 and
/u/Web/foo4/foo5.  Notice that this allows more than one file name
to be passed to the script which is not currently possible.

One final minor suggestion.  If the PATH_INFO data actually starts
with '=' as the first character, I would have the server strip this
character before putting the information in the environment variable.
This would be convenient for very simple scripts that shouldn't have
to do any parsing.  Thus a URL ending in

	/foo1/foo2/=foo3/foo4 

would have PATH_INFO set to "foo3/foo4".  You could also keep the 
PATH_TRANSLATED environment variable for this kind of URL and then
almost no changes would be necessary in current scripts.

What do you think?


John Franks 	Dept of Math. Northwestern University
		john@math.nwu.edu