Re: Proxy Servers

Tim Berners-Lee <timbl@ptpc00.cern.ch>

Mail folder: WWW Talk Jan 94-present
Next message: Dave_Raggett: "Re: Server control over history? "
Previous message: rodw@cbl.leeds.ac.uk: "Re: Server control over history?"
Maybe in reply to: Kevin Altis: "Re: Proxy Servers"
Reply: Lou Montulli: "Re: Proxy Servers"

Errors-To: listmaster@www0.cern.ch
Date: Wed, 16 Feb 1994 11:37:28 --100
Message-id: <9402161035.AA03315@ptpc00.cern.ch>
Errors-To: listmaster@www0.cern.ch
Reply-To: timbl@ptpc00.cern.ch
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Tim Berners-Lee <timbl@ptpc00.cern.ch>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Proxy Servers
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 7520

> Date: Wed, 16 Feb 1994 03:11:37 --100
> From: altis@ibeam.jf.intel.com (Kevin Altis)

> At  2:11 AM 2/16/94 +0000, Markus Stumpf wrote:
> >looks like lots of people are hacking on proxy clients/servers these days.
> >Could we please agree on some standards ... on the server and on the  
client
> >side?

I am sorry that I have been late in replying to this stuff to
clear up some misunderstandings.  The fact is that the "proxy",
or gateway operation, has been in the WWW code for a very long time.
It was commented out when the client developed a rule file
for one release, because I stupidly imagined that the rule file
would do the job when it won't for a HTTP-HTTP gateway.
But it was put back in again, and then removed recently when
the code was reorganised, and I forgot to put it back in.
In fact, to get wais: urls into more common usage, the
code had the address of the cern gateway for wais built in as
a default at one stage.

> Well, that's what Lou, Ari, and I are doing, setting the application level
> proxy standard for the Web. We've defined how a client can speak HTTP to a
> proxy server in order to interact (GET, POST, PUT...) with the Web without
> losing any functionality on the client side. This is necessary for clients
> behind firewalls, but is also useful when you want the proxy server to act
> as a caching server for a site, minimizing Internet traffic to and from
> that site. The client always speaks HTTP to the proxy server and results
> are always returned via HTTP (actually in the same connection, usual
> stuff). The proxy server in turn speaks HTTP, FTP, Gopher, WAIS, whatever,
> in order to retrieve the actual data, but always returns the data as an
> HTTP MIME message, doing MIME typing on the fly; The GET examples I posted
> a few days ago were intended to show the client/server conversation.
> 

> >The proxy environment variables cause the URLs to be sent with
> >the protocol, the others without (I never checked what the real
> >difference is in the use of WWW_xxx_gateway and the proxy one,
> >and I never understood why the protocol info was omitted around
> >libwww-2.09).

There is no difference.
Sorry for the code drop-out, it will go back in.  (We didn't have Ari at
the time!)

> Only four environment variables: http_proxy, gopher_proxy, ftp_proxy,
> wais_proxy. All expect to be set to a full URL, for example in bourne
> shell:
> ftp_proxy=http://host.domain:911/
> export ftp_proxy
> 

> Actually, you can proxy news as well via the cern_httpd, but that's not
> such a great idea. The environment variable names are different than the
> old WWW_protocol_GATEWAY environment variables so that they don't get
> confused with the old mechanism and allow sites using the older mechanism
> to migrate smoothly to the new standard method. 

Aaagh!  The old and new methods are protocol-wise the same.  I like
the name "proxy" but I don't think that changing the environment variables
helps.  The effect is exactly the same.

> I think people mainly used
> the old method as a WAIS gateway with the oddball double URLs.

What are oddball double URLs?  There were two methods of using the
wais gateway. You could eiethr use it in gateway mode, by setting

	WWW_wais_GATEWAY=http://info.cern.ch:8001/
	exprt WWW_wais_GATEWAY

in which case the client sent the wais URL unmodified

	(connect to info.cern.ch on 8001)
	GET wais:/wais.dom.ain/database?query

OR, someone refering to a wais database who didn't want to rely on users
having that gatway set up would put a pointer to a mapping onto
http space http://info.cern.ch:8001/wais.dom.ain/database?query
In this case the server would get

	GET /wais.dom.ain/database?query

but being a smart server it would handle that too.  So you could have your  
cake or eat it.  In fact becaus a lot of things made explict reference
to http-mapped gateways, there were a lot of mapped URLs about,
and that will continue to be the case for anything which doesn't have
its own access protocol (like hytelnet) and when there is a wais gateway
running very efficiently on the same machine as a given wais database, in
which case to force access to go through that gateway saves everyone time.

> The big difference between the old WWW_protocol_GATEWAY proxy and the new
> standard, is that in the new method the client always sends a full URL (a
> real URL, like what the user would see in the client, not a double URL)

There is no such difference.  If by "double URL' you mean the mapping of
wais space onto http space, then that was just a convenient extra.
Check out the HTTP spec,

"Unless the server is being used as a gateway, a partial URL shall be given  
with the assuptions of the protocol (HTTP:) and server (the server) being  
obvious." (from
<http://infoc.ern.ch//hypertext/WWW/Protocols/HTTP/Request.html>)

> Since the proxy server, gets a full
> URL from the client, the same proxy server can proxy requests for all
> destination protocols (http://, ftp://, gopher://, wais://).

In fact the info.cern.ch:80 server was working in this mode when it
was running on a NeXT.  When we moved to solaris   :-(  the wais code broke,
and we ain't fixed it yet, which is why info.cern.ch:8001 is down.

Two reasons that all this has been less evident and hasn't worked.
One is that the code was taken out of the libwww. The other
is that the wais gateway not there for all to use since Christmas.

> Also, the
> proxy simply sends along all of the metainformation fields, Accepts, etc.
> from the client when the URL is for an HTTP server (http://). This way, as
> the HTTP protocol support expands in our clients and servers to include
> more metainformation and so on, your site proxy server doesn't have to be
> upgraded. The proxy server is just that, a proxy between clients and
> servers on the Internet.

This is something which the old server didn't do, but Ari's new release will.

> >And while we're on it I'd really like to have some mechanisms to only
> >use a gateway at all, if the clients cannot connect directly. (IMHO
> >it doesn't make too much sense to connect to servers on the same
> >subnet/domain, that are e.g. on the same side of a firewall through
> >a gatway server.)

I agree.

> Each client application will have to decide when to proxy and when not to.
> A few messages went across this list about standardizing how clients should
> make that decision, but we need more discussion. For now, all clients use
> all or nothing proxying on a protocol by protocol basis.

I agree.

> >But before we have 20 environment variables to control the client
> >and 2 different URLs that are sent out on gateway requests, could we
> >please agree on some "standards" ?

Yes.  I think that a simple and common case will be that
anything within a certain single domain will be local access.
Generally the firewall or the weak link is at a domain boundary,
to all intents and purposes. One possibility is to force ALL traffic
outide a domain to use a server, which would need two env variables

	WWW_FIREWALL_GATEWAY	http://gateway.acme.com/
	WWW_FIREWALL_DOMAIN	acme.com

of couse a good default would be to guess that the domain was the 

domain of the gateway server, which would just mean one env variable.
Another would be to do it separately by URL scheme.

	WWW_http_GATEWAY	http://gateway.acme.com/
	WWW_http_DIRECT_DOMAIN	acme.com

Any thoughts on this?  Kev like to propose something and the code
as a function of any other comments?  I agree we want to keep it
simple.

Tim Berners-Lee