Re: Performance analysis questions

Rick Troth <troth@rice.edu>

Mail folder: WWW Talk Apr 94-present
Next message: Dave Long: "Re: Using Mosaic and HTTPD with multiple network interfaces "
Previous message: Christian L. Mogensen: "Re: Using Mosaic and HTTPD with multiple network interfaces"
Maybe in reply to: Daniel W. Connolly: "Re: Performance analysis questions "

Errors-To: listmaster@www0.cern.ch
Date: Sun, 29 May 1994 05:36:55 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <Pine.3.89.9405271042.A4118-0100000@brazos.is.rice.edu>
Errors-To: listmaster@www0.cern.ch
Reply-To: troth@rice.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Rick Troth <troth@rice.edu>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Performance analysis questions 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Mime-Version: 1.0
Mime-Version: 1.0

	I'm surprised and crushed by Dan's response. 
 
> >	Argh!   This is bad.   Not picking on you,  George, 
> >but you've presumed on something that a *lot* of people seem to 
> >presume on.   It should be 
> > 
> >		CR LF [any amount of white space] CR LF 
> > 
> >	There are contemporary systems that CAN NOT generate a 
> >completely empty line in places.   This is a problem for certain 
> >mail user agents which don't see the header termination because 
> >the blank line isn't an "empty line" (cr/lf/cr/lf).   ... 
 
> > 
> >	Thoughts? 
> 
> Yes... let's nip this sort of thing in the bud, shall we?
> 
> HTTP is not Internet Mail. 
 
	Right.   And Internet Mail is broken.   Let's not see HTTP 
break because someone misinterpreted the spec.   We need to clarify this. 
I say that we should clarify it in the looser direction w/r/t plain 
text and trailing whitespace in particular.   I see no reason to 
penalize clients and servers that have platform limitations ... 
unless it's just out of spite.   What's the deal, Dan? 
 
> HTTP is not for the human eye: it's for a piece of software that groks
> TCP (or perhaps some other reliable transport eventually...).
 
	If by this statement you're pointing out a misimplication 
in my note,  I accept the correction.   I didn't mean to suggest 
that HTTP is for human consumption.   What I *did* (still do) 
mean to suggest is that,  to the greatest extent possible, 
HTTP be clearly defined as a  PLAIN TEXT  protocol. 
 
	I think we all agree that  "plain text"  protocols are a 
Good Thing.   But we've agreed to that without bothering to define 
what on earth  "plain text"  is.   I don't think  HTTP  should be the 
protocol to bear the  plain text  torch,  but I think it'd be foolish 
to plow blindly forward without thinking carefully about it.  There's 
so much in HTTP that's wonderful,  things like the URL being a single 
blank-delimited token;  see it all the way through!   (you'd better be 
discarding any trailing white space from your GETs;  are you???) 
 
	That's why I said ... 
 
> >     Try this:
> >         
> >         o   a line of text is NUL terminated
> >             (assuming you're coding C on UNIX) [YMMV]
 
	Better:  "a line of text is end-of-record terminated", 
where  end-of-record  is defined by local O/S considerations. 
 
> >         o   when sending "on the wire" append CR & LF
> >         o   when receiving, accept either NL (LF)
> >             or CR LF for end-of-line
> >         o   when processing, ignore trailing white space
 
	And add one more:  TAB and SPACE process the same. 
 
	This isn't arbitrary,  it fits the  "be conservative about 
what you generate and liberal about what you accept"  rule. 
(or do we disagree about that too?) 
 
> Let us keep the HTTP protocol clear and free of such kludgery.
 
	This is not kludgery!   This is robust design. 
 
> In the HTTP headers, A line is terminated by CRLF. That's octet 13,
> octet 10. Anything else is broken. One should not expect to use
> idioms such as:
> 	printf("HearderName: stuff\n")
> or
> 	echo "HeaderName: stuff"
> successfully. Care must be taken to terminate lines with CRLF.
 
	Certainly.   Any HTTPD will have to map local conventions to 
on-the-wire streams.   Any HTTPD will have to map end-of-line to 0x0D 0x0A. 
We can make certain demands of the various HTTP server implementations. 
I say we *not* demand that  "plain text"  (including HTTP headers) 
be anything more than outlined above. 
 
> Similarly for the blank line that ends the headers: I'm not sure if
> RFC822 specifies that the line shall be empty or not, but I'd support
> a clarification in HTTP that says it shall.
 
	That's the problem.   It doesn't specify! 
 
	I'd support a clarification that it  NEED NOT  be empty. 
If you specify that it  MUST BE EMPTY  (eg: CR/LF/CR/LF)  then 
at least you've specified,  but you'll have tightened the spec 
in the direction of least ease of implementation. 
 
> The data stream is something different altogether. 
 
	And they're beyond the scope of this argument.   But ... 
 
> This means, for example, that you shouldn't expect html lines to
> be terminated in any particular way. Of course it doesn't matter how
> they're terminated except inside PRE elements. There, I'd say that
> a newline is (CR|LF|CRLF).
 
	Which can safely become  LF, LF, LF  on a UNIX client host. 
You wouldn't want the  "save to disk"  option to leave those CRs in 
there,  would you?   Still,  here too,  any trailing white space should 
be considered fair game. 
 
> Daniel W. Connolly        "We believe in the interconnectedness of all things"
 
	Exactly! 
 
> Software Engineer, Hal Software Systems, OLIAS project   (512) 834-9962 x5010
> <connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
 
-- 
Rick Troth <troth@rice.edu>, Rice University, Information Systems