Re: Gopher+ Considered Harmful

Guido.van.Rossum@cwi.nl

Mail folder: WWW Talk 1992 Archives
Next message: Tim Berners-Lee: "Hytelnet server; WWW library version 1.1 (was: Lynx)"
Previous message: Tim Berners-Lee: "Re: Gopher+ Considered Harmful "
In-reply-to: Tim Berners-Lee: "Re: Gopher+ Considered Harmful "

Message-id: <9212111519.AA00320.guido@voorn.cwi.nl>
To: www-talk@nxoc01.cern.ch
Subject: Re: Gopher+ Considered Harmful 
In-reply-to: Your message of "Fri, 11 Dec 1992 15:18:01 MET."
             <9212111418.AA02698@www3.cern.ch> 
From: Guido.van.Rossum@cwi.nl
X-Organization: CWI, Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
X-Phone: +31 20 5924127 (work), +31 20 6225521 (home), +31 20 5924199 (fax)
Date: Fri, 11 Dec 1992 16:19:45 +0100
Sender: Guido.van.Rossum@cwi.nl

Tim writes:

>It is not the space to buffer the stuff in the average case which is a problem
>
>There are extreme cases: Long documents which spew out of format converters
>piped into other format converters.  These things wouyld blow the memory of a
>server which we never like to do.
>
>There is the cumulative effect of response times. Curerntly, almost all the W3
>code is pipelines, so the reponse (click mouse to first character on screen)
>is a function of the round trip delays and any real retrieval time. The moment
>you put a buffer in to count bytes, you have to wait for the first until the
>last is available. In the (frequent) case of many stages being involved in a
>pipeline the response time does not in fact increase much, you just get a lot
>of CPU from processors on the pipe line.  Once you buffer it up, you are using
>CPU from one processor at a time.   You can't start displaying it until you've
>parsed it and you can't parse it until you've read it and you can't read it
>until the server has counted it and he can't even start to count it until all
>the real work has been finished.
>
>You will notice the difference immediately.

Yes, I see.  This means there is a problem whether you put a byte
count in the header or in an "envelope" sent before the header, so
Dan's solution is just as wrong as mine. :-(

>Piping things until EOF is so much faster.  Can TCP really not tell the
>difference between a remote connection close, and a broken connection? :-((
>(APIs apart)

I tried to find out a definitive answer but it is hard to figure out.
At the kernel level there is definitely a difference between a
shutdown by the remote side and a network failure, but at least on
UNIX both situations are passed to the client as an EOF.

One solution would be to add a standard string after the document,
e.g. CR LF "*%*%*%END-OF-DOCUMENT%*%*%*" CR LF, so that the client can
assume it has received the whole file if it sees this at the end of
the file.  The client cannot assume that the data *ends* when it sees
this (it is not another form of dot-on-a-line-by-itself), since it may
occur within the data as well, but it will know that it didn't get the
whole data if it doesn't end with this.

A better version would be to if the server could calculate a simple
checksum or "signature" (e.g. MD5) of the data and append this to the
end of the data.

--Guido van Rossum, CWI, Amsterdam <guido@cwi.nl>
"I'm sorry but I'm not allowed to argue unless you pay."