Re: Performance analysis questions

robm@ncsa.uiuc.edu (Rob McCool)
Errors-To: listmaster@www0.cern.ch
Date: Thu, 12 May 1994 01:44:36 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <9405112340.AA00168@void.ncsa.uiuc.edu>
Errors-To: listmaster@www0.cern.ch
Reply-To: robm@ncsa.uiuc.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: robm@ncsa.uiuc.edu (Rob McCool)
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Performance analysis questions
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
/*
 * Re: Performance analysis questions  by George Phillips
 *    written on May 12,  1:29am.
 *
 * Andrew Payne said:
 * >I started hacking some instrumentation into NCSA's httpd to see where it 
 * >was spending time (I couldn't find a profiling tool that would work through 
 * >the fork(), though in retrospect it probably would have been easier to run 
 * >the server in single connection mode and throw out all of the startup 
 * >stuff).  NOT counting the fork() time, I found the server spending about 
 * >20-30% of its time (wall and CPU) in the code that reads the request 
 * >headers.  Code like this doesn't help (getline() in util.c):
 * >
 * >           if((ret = read(f,&s[i],1)) <= 0) {
 * >
 * >Your mileage may vary.
 * 
 * Whilst looking through the httpd code, I noticed this too.  I meant to
 * send off a "bug" report to Rob, but never got around to it.  This is
 * pretty expensive way to go about things.  Sure, a big Sparc II can
 * crank through read calls at 100,000 per second, but at around 1000
 * characters per HTTP/1.0 header it adds up.

An unloaded sparc 2 can do that... the problem arises when that sparc 2 is
handling 100 connections.

 * This is done because it wants to hand off the file descriptor to
 * CGI scripts that handle POSTs.  I'd suggest the right way to fix
 * things is to read a bufferful and cat the extra to the scripts that
 * need it.  However, a quick hack could double the speed by doing
 * read(f, &s[i], 2) because you know that at least CR LF will terminate
 * the line.  If it's the header boundary that's a problem, you could
 * quadruple the speed with read(f, &s[i], 4) since you have at least
 * "GET " for HTTP/0.9 requests and HTTP/1.0 headers will terminate
 * with CR LF CR LF (well, they better!).
 */

I feel compelled to explain one of the worst implementation decisions I've
ever made. Yes, that's why it was done. Why? At the time I was implementing
it, Marc was testing my code by sending me forms literally megabytes long.
At roughly the same time, Mosaic/X started sending full Accept: headers that
often totalled over 1000 bytes. I wasn't aware the headers were so long (I
thought that they were MUCH shorter) and processing forms that were
megabytes long would be very common, then this would be a big win.

As it turns out, the headers for Mosaic/X are over 1000 bytes long, forms
are almost always well under 1000 bytes themselves, and my implementation
loses big time and is not scalable. This is something I was meaning to fix
in 2.0, but for the NCSA version someone else will be taking up the reins.

--Rob