Re: Gopher+ Considered Harmful

Dan Connolly <connolly@pixel.convex.com>
Message-id: <9212101805.AA05022@pixel.convex.com>
To: Guido.van.Rossum@cwi.nl
Cc: www-talk@nxoc01.cern.ch
Subject: Re: Gopher+ Considered Harmful 
In-reply-to: Your message of "Thu, 10 Dec 92 10:55:24 +0100."
             <9212100955.AA26659.guido@voorn.cwi.nl> 
Date: Thu, 10 Dec 92 12:05:02 CST
From: Dan Connolly <connolly@pixel.convex.com>

>I once explained the current HTTP protocol to a local network guru and
>he expressed concern that basing a protocol like this on TCP/IP is a
>great waste of network resources, since you are using a
>session-oriented protocol for what is essentially one remote procedure
>call.

Do the WAIS folks know about this? I wonder what they'd say...

>  My question "then what would you recommend instead" provoked no
>useful answer (what is needed is *reliable* datagrams, but these are
>not implemented as an IP protocol; UDP requires too much coding for
>time-out, retransmission and fragmentation).  Yet, he convinced me
>that a light-weight protocol like this should minimize the number of
>round-trips.

I agree.

>I have the feeling that the current trend of basing the new protocol
>on NNTP violates that requirement.  I like the idea of borrowing
>response and data formats from the FTP/SMTP/NNTP family of protocols,
>with suitable extensions for 8-bit data paths.  However I don't like
>it if compatibility with NNTP forces us to have an extra round trip
>just so that the server can give its welcome message.

The idea was to use the existing usenet distributed database. But
I guess we should just use plain old NNTP for that.

>Also, I don't like the fact that you must parse the RFC822/MIME header
>to find out how many bytes are to be expected.  This seems to be
>mixing two levels of protocols: MIME assumes that the end of the
>message is already known, and the MIME headers then tell you what to
>do with it.

I certainly didn't think it out very carefully, did I?

>As I see it, there are two possible ways of using MIME in HTTP+.  We
>can either support MIME as the *only* data format (implementing any
>extensions we need as new MIME content types or subtypes or additional
>headers), or we we support MIME as one of the possible data formats.

A terminology note here: there is no one "MIME data format." There's
the ubiquitous message/rfc822 format that you can stick anything
inside using MIME techniques. But the basic unit of information
in the MIME spec is an _entity_ -- just an arbitrary stream of bytes.

The question is, when you're sending an entity from one
place to another, how do you know where the end is?

From the MIME point of view, an NNTP client and server have
an implicit agreement that the entity going across the
wire has a content-transfer-encoding of 7bit.

This allows them to use the dot-on-a-line-by-iteself technique to
terminate the entitiy.

They also share assumptions about the content-type as
a separate issue. The client assumes the response to an
ARTICLE command is a message/rfc822 entity, while the
response to a BODY command is text/plain.

So we'll start by throwing out the 7bit restriction and
assuming binary content transfer encoding. Then we need
a new way to terminate entities. It would be nice to
just stick the bytecount in the status message, and then
blast the entity across.

But HTTP includes a format negociation such that the client
doesn't know the content-type of the returned data
until it comes back. The easy way around that is to _enclose_
all entities in message/rfc822 entities, using the
Content-Type header.

So the server would have to 
1) compute the headers to enclose the entity in a message
2) compute the length of the new entity (headers + body)
3) send a status message with that bytecount, and finally
4) send the headers and the body entity.

Note that in step 2, everybody has to be consistent about
the fact that newlines count as _2_ characters: CR and LF.

The the client has to

1) read the status message and extract the bytecount
2) slurp up that many bytes
3) find the blank line that separates the header from the body
4) parse the content-type out of the headers
5) process the body based on the bytecount

That's why I ended up mixing the two levels of protocol
(message/rfc822 headers and HTTP+ status messages.) It would
be easier for the server to:

1) send a status message indicating binary transport (but no bytecount)
2) print the enclosing headers as they're computed
3) print one more header that has the bytecount of the body
4) print a blank line separating the header from the body
5) blast bytecount octets of data over the wire

The the client just does:

1) read and parse nice 7bit headers, one at a time.
2) when you get to a blank line, you know the bytecount and
   the content-type of the message.
3) slurp up bytecount bytes of data
4) process it according to content-type.

It is somewhat intertwingled, but I still kinda like it.

Dan