Re: Semicolon's for all

rst@ai.mit.edu (Robert S. Thau)
From: rst@ai.mit.edu (Robert S. Thau)
Date: Fri, 31 Dec 93 13:55:43 EST
Message-id: <9312311855.AA04894@volterra>
To: henrich@crh.cl.msu.edu, www-talk@nxoc01.cern.ch
Subject: Re:  Semicolon's for all
Cc: robm@ncsa.uiuc.edu
Content-Length: 5521
I really hate to keep beating this horse, but perhaps one last time...

Charles Henrich writes ...

  I've come up with another reason (which was my original reason way back,
  but I had forgotten about it until I just tried to do something).  With
  NCSA's inlined includes, I can server any document and have the inlined
  include do something special based on the URL.  For example, the inlined
  include can peel off the '?' args and return various bits of information
  (such as in the interactive weather browser here).  However, I can only
  give *one* piece of information, and I cannot remember information in the
  URL by using this method (as I have been doing).  Multiple question marks
  in a URL are a no-no, and even with one question mark, the state gets
  obliterated on a mouse click or isindex search.  I cant use the path hack
  for the URL, afterall, im not trying to execute anything at all,

But you are trying to execute something --- the code that looks at the
information you're passing along. 

  I want
  to serve up a standard text document, and have a program in it parse the
  URL.

A document with server includes is not a standard text document, to my way
of thinking --- standard text documents don't *have* programs in them.
Rather, it's a program in a special-purpose language which is designed to
*produce* standard text documents (as is, say, a TeX input file).  But
that's a rather different thing.

It's worth noting that the use of that language has a certain cost.  The
simplest use of server includes --- say, something like <inc srv "|date">
--- spawns off a Bourne shell, which does at least a couple of file system
operations before spawning off yet another process which finally prints the
date.  A document with several includes incurs this overhead several times
over.  And, at least for those of us who have been spared AFS, spawning a
process costs rather more than a couple of stats.

Much of this overhead could, in principle, be avoided.  If the
server-include code spawned off the "date" process directly, rather than
invoking popen(), which calls up the shell, then the number of process
spawns would be cut in half.  Some of the file system ops would also go
away (all of them in the 'date' example, because 'date' doesn't go near the
filesystem, but that depends on the program being run).  Of course, you
would lose certain convenience features --- path search, argument parsing,
and so forth --- but none of them comes for free, and as I understand your
setup, some of them (path search in particular) aren't necessarily cheap.

I don't begrudge you any of this, if you feel it makes your life easier ---
if there's one thing that ought to be clear by now, it's that I think some
things are worth a price.  I just wish you'd measure the alternatives to
what you're proposing by the same yardstick.

  I *need* the semicolon syntax to do this.

I think I've already pointed out a couple of alternatives.  From the top:

The current PATH_INFO machinery could be extended to retrievals of 'ordinary'
files.  That is, when the server gets a URL which translates to a path like

  /path/to/the/document/here-it-is.html/whatever-you-like

it could retrieve here-it-is.html, saving the "extra" portion of the
pathname for a PATH_INFO argument to any server includes.  As we've been
through several times, this needn't cost anything at all if PATH_INFO is
absent, and the cost even if it's present is at most a few extra stats
above and beyond what the server is doing anyway.

Alternatively, if you want something NOW and you're really hard up, you
could switch to a more powerful scripting language --- something like this:

  #! /usr/local/bin/perl
  do "set-the-weather-variables.pl"; # This code sees PATH_INFO & all CGI vars
  print <<EOF;
  Content-type: text/html

  <title> Weather for $weather_where </title>
  <h1> Weather for $weather_where </h1>

  Here is the available information about weather at $weather_where as of
  $weather_when. <p>

  Maps are available from $weather_sats; see $weather_map_anchors for
  more information.  Also, the local forecast is <a href="$localf">here</a>,
  and the long-range forecast for the region is <a href="$longrangef">here</a>.
  <p>

  In order to use this information ...
  ...
  EOF

Note that the file is basically HTML with $variable inclusions, except for
four lines at the top to invoke Perl code (in a separate file for the
example) which actually sets the $variables.  So, for the most part, it
*looks* like an ordinary text file.  It really isn't, but like I said, I
don't think HTML files with server includes are either.

This is, I'll grant you, not the prettiest thing in the world.  However,
it's not *so* bad, it does get the job done, and it even has a few
advantages.  Being able to throw an 'if' statement into the middle of these
things can grow on you.  Also, it's likely to be at least a little more
efficient --- CGI scripts are spawned directly by the daemon (not through a
shell) and Perl is a powerful enough language that it can do at least
simple jobs with no further process spawning at all.

(If you don't like Perl, you can play the exact same game with any of the
shells, but it may cost you a bit).

In short, I don't see that the semicolon syntax is actually necessary for
what you want to accomplish.  Plausible alternatives are at least as
convenient, and so far as I can tell (when the whole cost of running the
retrieval is factored in) only marginally more expensive.

rst