Faster HTTP Was:Re: The Superhighway Steamroller

HALLAM-BAKER Phillip (hallam@dxal18.cern.ch)
Tue, 5 Jul 1994 05:57:46 +0200

In article <78DF@cernvm.cern.ch> you write:

|> gtn@ebt.com (Gavin Nicol) writes:
|>
|>>(1) and (2) will be fixed with faster hardware, while (3) will get
|>>worse and worse. I have fond memories of 2k and 4k machines, and the
|>>machine language programming I did on them, but I wouldn't wish that
|>>on anyone nowadays; it's just not cost effective.
|>>
|>>I often wonder just how much header/message parsing costs HTTP...
|>
|>
|> Extensive headers cause problems over dialup connections; a 1K set of
|>headers can add almost a second to the transaction time for every single
|>request. This order of magnitude difference makes the cost parsing
|>insignificant.

For dialup connections we realy should look into something better. A proxy
server at the other end of the dialup could have compression and decompression
built in to save extra bandwidth. The accept fields could be lodged with the
proxy and never need to go over the dialup line because for a given client
they don't really change from request to request.

|> Header parsing adds about 25% to the total transaction time for the
|>NCSA server (HTTP/0.9 vs a set of 1K HTTP/1.0 headers generated by xmosaic).
|>Measurements taken on a lightly loaded ethernet, using a sparc 10SX to generate
|>queries, and a SparcStation 20/512 as server.

The header parsing bit is a drag on everything, agreed. However the reason for
using it is because it is difficult to get acceptance for a binary protocol.
I think we should as a minimum make both systems interoperable. A dummy method
could be used as an id :-

NULL .. HTTP/2.0

I think we will have to wait for the URN scheme to make it work though. This
will make rerouting of different protocols much easier.

|> The most intersting thing about headers is that about 99.99% of all
|>web transactions, they are absolutely useless. The best way to handle headers
|>is to ignore them unless you know that they might actually make some sort
|>of difference; i.e. if multiple types are available for a given URL, or
|>if a transaction needs authentication; otherwise, they're just a waste of
|>space.

Potentialy they could be very usefull. Particularly when the data object refered
to is synthesized on the fly (eg database gateway). The Mosaic method of using
them is completely braindead however since it always sends */* at the end! If
it gave q factors then this would not be an issue. Mosaic could also save lots
of bandwidth by using one accept header with multiple arguments.

|> The worst culprit is Accept; the negotiation format would be vaguely
|>defensible for a session involving several transactions, but is much too
|>expensive to be useful for one-shots.
|>
|>I'll try and explain this more when I get the FHTTP spec out. I haven't
|>had time to finish this as I'm still documenting and commenting my multi-
|>threaded server <liemode> I love this part</liemode>.

One idea we had a dinner last night is to have `accept groups'. To first
order one can infer most of the image etc formats understood by the user
agent id field. After all all mosaics are going to do gif and html, the
CERN linemode is going to do html etc... Now the problem here is
maintenance since the server must know what the groups mean... even if the
group was declared long after the server came up... URL time!!!!

Accept-URI: http://www.cern.ch/Accept/Linemode

OK so this >Looks< like we have an extra connection per transaction. Quelle
horreur! In fact we cache the page - cleverly in parsed form. So we only do
one extra GET and one parse for the accept group each time the server comes up.

I would like to wait until the URN scheme is a little firmer though. We don't
want CERN or NCSA being slammed by every server each time it starts up.

Accept-URI: urn://org.w3/Accept/Linemode/2.16.pre69

With the mappings

URN org.w3/Accept/*
MAP http://www.cern.ch/Accept/*
MAP http://www.mit..../Accept/*
MAP http://www.ncsa.../Accept/*
MAP http://.../Accept/*
MAP nntp:

etc...

I think we should try to optimise HTTP within the boundaries of IETFdom first
before going for the binary system. We should also move to a binary system
in a coherent fashion in such a way that *any* IETF protocol can be used.

Also note that under this scheme a server that makes no use of the headers
doesn't even bother with the URN at all.

<STOP PRESS>

Aggh! I made a mistake here! Why restrict this JUST to accepts? We can
generalize! Have a deferred header! Allow ANY header field inside it.
So make that :-

Header-URI:

OK so I can't think of another use besides accept but one might appear.
We could also add in much more info into the header knowing it would only
be sent on a one off basis.

--
Phillip M. Hallam-Baker

Not Speaking for anyone else.