Re: your mail

rst@ai.mit.edu (Robert S. Thau)
From: rst@ai.mit.edu (Robert S. Thau)
Date: Wed, 29 Dec 93 13:54:53 EST
Message-id: <9312291854.AA03807@volterra>
To: Charles Henrich <henrich@rs560.cl.msu.edu>
Subject: Re: your mail
Cc: www-talk@www0.cern.ch
Content-Length: 10652
Charles Henrich writes ...

> In fact using the ';' to determine if a script was would disallow most of
> what Im doing.  I use the inlined include facility of NCSA's server
> extensivly.  I want the server to return
> 
> http://host/path/document
> 
> And then the document calls an inlined include which can then decipher
> the ';' attributes, making all sorts of interesting things possible.

Hmmm... You have a point.  Still, scripts run from NCSA's server includes
don't actually get the CGI variables set yet.  Even if they did, it
wouldn't be too hard to make the current PATH_INFO mechanism do the job ---
when presented with

  GET /path/document/parameter/and-another-one

the server would find the prefix which refers to an actual file (i.e.
'/translated-path/document'), and leave the rest as PATH_INFO for potential
includes.  This leaves aside the question of efficiency, of course, to
which I'll return.

As a side note --- there are other ways to skin the same cat.  For
instance, on my server as currently configured, I can create a script
called 'document' which looks like this:

 #! /usr/local/bin/perl

 print<<EOF;
 <title> blah blah blah </title>
 ...
 EOF

 &play_with ($ENV{'PATH_INFO"});

 print<<EOF;
 <H1>More grunge here</H1>
 <H2>Nirvana in Buddhist thought as related to Christian Eschatology</H2>
 gubbish gubbish gubbish...
 EOF

Retrievals on http://host/path/document/params/here would then do what you
want.  This is arguably a little more awkward than server includes; it may
be more or less efficient (certainly less efficient if the includes simply
include other ordinary files; probably more efficient if there are several
includes which fork off child processes whose work could have been done by
the single perl script, without further process spawning).

> I'd like to say, this syntax could be *in addition* to the current method, it
> doesnt need to replace it.  Im finding a situation here where the forced
> "stat'ing" of non-existant files to be distasteful, wasteful, and a very
> potential problem with servers that are heavily utilized.

First off, with regard to aesthetics, de gustibus non disputandum est.  My
personal 'aesthetic' objection to the semicolon syntax is that it keeps me
from changing directories to scripts-with-path_info and back without making
the change in status visible in the URLs and making me change all the
references.  (I don't think this is a totally wild idea --- I've been
chewing over turning the 'people' directory on my server into a script
which redirects to ~.../public_html areas if they exist for the user in
question, and makes up a default home page if they don't).  

Still, so long as I can turn an ordinary *file* into a script and back
without having to find and change everything that cites it (which can be a
real pain in the butt) or doing an Alias or Redirect in srm.conf (which
could get ugly if they started to add up), this isn't a *major* issue.  If
the new syntax is an optional alternative, I have no real objection (though
somebody else might --- two ways of specifying PATH_INFO does add a little
complication to the server).  I'm frankly more hung up on the notion of
incompatible changes to something which has been announced as a standard,
over what I see as quite minor efficiency concerns.

This efficiency argument is apparently the nub of the dispute --- I just
don't find it easy to see how these few extra stat() calls, which needn't
occur unless PATH_INFO is present, can possibly amount to a potentially
serious problem, in the context of all the other things the server does
when processing a request.

To try to put this in context, I've appended a system-call trace of my
(hacked) httpd processing the request 'GET /cgi-bin/fortune'.  The trace
was collected from a server running as 'ServerType inetd', so to keep
things fair I've deleted all the initialization, opening of the logs, and
so forth, and picked up where it actually starts to process the request.
For convenience, I've pointed out the PATH_INFO search in the middle of it.
It amounts to one stat() --- it would have been five with the stock httpd
(Rob goes top down, I go bottom up); there also would have been a little
more monkey business if the script had been run out of a normal directory
instead of cgi-bin, if I had FollowSymLinks disabled (which makes the
daemon walk the pathname looking for symlinks to see if it should deny
access), or if PATH_INFO were actually present.

Due to limitations of the tracer, this count excludes housekeeping system
calls done by the daemon child process which exec()s the script.  More to
the point, it excludes the load put on the system by the CGI script itself,
beginning with the exec(), which is hardly cheap when individual stat()s
are in the balance.  The most trivial possible C program --- 'main () {}',
generates 43 lines of system-call trace when I run it dynamically linked
under SunOS (mostly due to the shared library mechanism), including 23
opens, reads, and mmaps.  In the particular example here, of course, the
overhead would amount to far more --- the NCSA 'fortune' gateway is a shell
script, and you can barely turn around and sneeze in Bourne shell without
spawning off a child process or two, and doing more stat()s for searches
along $PATH than anyone would care to shake a stick at.

Leaving all that aside, and looking at the system calls executed in the
body of the daemon itself, we find a total of 148 system calls.  Because of
the way the trace was collected (see above), this count does not include
the overhead associated with accepting the connection in the first place,
which (for a standalone server) would amount to at least an accept(), a
fairly hefty fork(), and a bit of housekeeping.  Against this background, I
find it difficult to see how another stat() or two, or even ten, done only
for URLs which happen to invoke a script in the first place, could make
enough of a difference to matter.

rst

The system-call trace follows, picking up after the daemon opens the log
files and enters its process_request() routine:

...
getsockname (1, 0xf7fffea0, 0xf7fffe9c) = 0
getpeername (0, 0xf7ff5e30, 0xf7ff5e2c) = 0
getpid () = 3748
open ("/var/yp/binding/ai.mit.edu.2", 0, 036736176136) = 5
flock (5, 06) = -1 EWOULDBLOCK (Operation would block)
mmap (0x361e0, 14, 0x1, 0x80000001, 5, 0) = 0xf76f0000
close (5) = 0
socket (2, 2, 0) = 5
bind (5, "".., 16) = -1 EADDRINUSE (Address already in use)
close (5) = 0
gettimeofday (0xf7ff5ae8, 0) = 0
getpid () = 3748
socket (2, 2, 17) = 5
getpid () = 3748
bind (5, "".., 16) = -1 EACCES (Permission denied)
ioctl (5, 0x8004667e, 0xf7ff5ab4) = 0
fcntl (5, 02, 0x1) = 0
bind (5, "".., 16) = 0
getsockname (5, 0xf7ff5b5c, 0xf7ff5b7c) = 0
sendto (5, "".., 88, 0, 0x362e0, 16) = 88
getdtablesize () = 64
select (64, 0xf7ff5bc0, 0, 0, 0xf7ff5c30) = 1
recvfrom (5, "".., 1600, 0, 0xf7ff5bac, 0xf7ff5bbc) = 52
gettimeofday (0xf7ff5d48, 0xf7ff5d40) = 0
sigblock (0x2000) = 0
sigvec (14, 0xf7ff5dd4, 0xf7ff5dc8) = 0
sigvec (14, 0xf7ff5d5c, 0) = 0
sigsetmask (0) = 0x2000
sigblock (0x2000) = 0
sigvec (14, 0xf7ff5dd4, 0) = 0
sigvec (14, 0xf7ff5d5c, 0) = 0
sigsetmask (0) = 0x2000
setitimer (0, 0xf7ff5dd0, 0xf7ff5dc0) = 0
read (0, "G", 1) = 1
read (0, "E", 1) = 1
read (0, "T", 1) = 1
read (0, " ", 1) = 1
read (0, "/", 1) = 1
read (0, "c", 1) = 1
read (0, "g", 1) = 1
read (0, "i", 1) = 1
read (0, "-", 1) = 1
read (0, "b", 1) = 1
read (0, "i", 1) = 1
read (0, "n", 1) = 1
read (0, "/", 1) = 1
read (0, "f", 1) = 1
read (0, "o", 1) = 1
read (0, "r", 1) = 1
read (0, "t", 1) = 1
read (0, "u", 1) = 1
read (0, "n", 1) = 1
read (0, "e", 1) = 1
read (0, "\r", 1) = 1
read (0, "\n", 1) = 1
setitimer (0, 0xf7ff5dd0, 0xf7ff5dc0) = 0
sigblock (0x2000) = 0
sigvec (14, 0xf7ff5dd4, 0xf7ff5dc8) = 0
sigvec (14, 0xf7ff5d5c, 0) = 0
sigsetmask (0) = 0x2000
gettimeofday (0xf7ff5d68, 0) = 0
open ("/usr/share/lib/zoneinfo/localtim".., 0, 0) = 6
read (6, "".., 4136) = 746
close (6) = 0
ioctl (3, 0x40125401, 0xf7ff4eac) = -1 ENOTTY (Inappropriate ioctl for device)
fstat (3, 0xf7ff4f20) = 0
write (3, "localhost [Wed Dec 29 10:18:28 1".., 58) = 58
close (3) = 0
>>>> PATH_INFO stat ("/com/doc/web-support/cgi-bin/for".., 0xf7ff5968) = 0 <<<<
open ("/.htaccess", 0, 0666) = -1 ENOENT (No such file or directory)
open ("/com/.htaccess", 0, 0666) = -1 ENOENT (No such file or directory)
open ("/com/doc/.htaccess", 0, 0666) = -1 ENOENT (No such file or directory)
open ("/com/doc/web-support/.htaccess", 0, 0666) = -1 ENOENT (No such file or directory)
open ("/com/doc/web-support/cgi-bin/.ht".., 0, 0666) = -1 ENOENT (No such file or directory)
pipe (0xf7ff5dc0) = 3
fork () = 3749
close (6) = 0
getdtablesize () = 64
sigblock (0x2000) = 0
sigvec (14, 0xf7ff5424, 0xf7ff5418) = 0
sigvec (14, 0xf7ff53ac, 0) = 0
sigsetmask (0) = 0x2000
setitimer (0, 0xf7ff5420, 0xf7ff5410) = 0
read (3, "C", 1) = 1
read (3, "o", 1) = 1
read (3, "n", 1) = 1
read (3, "t", 1) = 1
read (3, "e", 1) = 1
read (3, "n", 1) = 1
read (3, "t", 1) = 1
read (3, "-", 1) = 1
read (3, "t", 1) = 1
read (3, "y", 1) = 1
read (3, "p", 1) = 1
read (3, "e", 1) = 1
read (3, ":", 1) = 1
read (3, " ", 1) = 1
read (3, "t", 1) = 1
read (3, "e", 1) = 1
read (3, "x", 1) = 1
read (3, "t", 1) = 1
read (3, "/", 1) = 1
read (3, "p", 1) = 1
read (3, "l", 1) = 1
read (3, "a", 1) = 1
read (3, "i", 1) = 1
read (3, "n", 1) = 1
read (3, "\n", 1) = 1
setitimer (0, 0xf7ff5420, 0xf7ff5410) = 0
sigblock (0x2000) = 0
sigvec (14, 0xf7ff5424, 0xf7ff5418) = 0
sigvec (14, 0xf7ff53ac, 0) = 0
sigsetmask (0) = 0x2000
sigblock (0x2000) = 0
sigvec (14, 0xf7ff5424, 0xf7ff5418) = 0
sigvec (14, 0xf7ff53ac, 0) = 0
sigsetmask (0) = 0x2000
setitimer (0, 0xf7ff5420, 0xf7ff5410) = 0
read (3, "\n", 1) = 1
setitimer (0, 0xf7ff5420, 0xf7ff5410) = 0
sigblock (0x2000) = 0
sigvec (14, 0xf7ff5424, 0xf7ff5418) = 0
sigvec (14, 0xf7ff53ac, 0) = 0
sigsetmask (0) = 0x2000
sigblock (0x2000) = 0
sigvec (14, 0xf7ff5134, 0xf7ff5128) = 0
sigvec (14, 0xf7ff50bc, 0) = 0
sigsetmask (0) = 0x2000
sigblock (0x1000) = 0
sigvec (13, 0xf7ff5134, 0xf7ff5128) = 0
sigvec (13, 0xf7ff50bc, 0) = 0
sigsetmask (0) = 0x1000
setitimer (0, 0xf7ff5130, 0xf7ff5120) = 0
ioctl (3, 0x40125401, 0xf7ff4fcc) = -1 EOPNOTSUPP (Operation not supported on socket)
fstat (3, 0xf7ff5040) = 0
read (3, "Kiss me twice.  I'm schizophreni".., 4096) = 35
read (3, "", 4096) = 0
ioctl (1, 0x40125401, 0xf7ff4fcc) = -1 EOPNOTSUPP (Operation not supported on socket)
- SIGCHLD (20)
fstat (1, 0xf7ff5040) = 0
write (1, "Kiss me twice.  I'm schizophreni".., 35) = 35
close (3) = 0
wait4 (3749, 0, 0, 0) = 3749
close (0) = 0
close (1) = 0
close (2) = 0
close (4) = 0
exit (0) = ?