Customer pull on HTTP2

Dave_Raggett <dsr@hplb.hpl.hp.com>
From: Dave_Raggett <dsr@hplb.hpl.hp.com>
Message-id: <9301081348.AA09490@manuel.hpl.hp.com>
Subject: Customer pull on HTTP2
To: www-talk@nxoc01.cern.ch
Date: Fri, 8 Jan 93 13:48:18 GMT
Cc: dsr@hplb.hpl.hp.com
Mailer: Elm [revision: 66.25]
I am getting really excited about the possibilities for the Web for a variety
of roles. Much us this however depends on getting the right features into
HTTP2. I would very much like comments on the following suggestions.

Authentication
--------------

I think that every HTTP2 request (not just GET) should include the "From"
field, and that it is strongly desirable to include the user's full name,
e.g.

    From: dsr@hplb.hpl.hp.com (David Raggett)

The user's name must be easy to extract regardless of the particular variant
of email addressing scheme.

For some services the server may need to check if the user is authorised for
this service. In many cases the Internet (numeric) address and the information
in the "From:" field will suffice.

Additional security will require a password. I think this should be a header
in its own right. The "POST" command names a document that you wish to post a
response to. That document may not be owned by you so it doesn't seem right to
muddle up the authorisation for the POST command with that documents Udi.

so lets have a new header  "Password:  xyzzy"

A further trick would be to encrypt your password for this server/service
together with the time of day using a scheme agreed by you and the server. The
server then decrypts the value sent with the Password header to extract the
password, and time of day and checks that the time is correct  to within a
margin for network delays.

This only effects the HTTP2 protocol in requiring different error codes to
distinguish the cases:

    a)  your Internet address is not permissible for this service
    b)  your user name is not permissible for this service
    c)  your password is incorrect
    d)  your password is ok, but the time check failed


Basic need for administrators identify and even to mail users.
Frequent need for authentication - not just on GET
encrypt password + time of day to foil copying password

Needless to say the value for the Password header should be composed of
printable 7 bit ascii characters, excluding white space and control chars.


Caching
-------

It will be desirable to avoid overloading servers with popular documents by
supporting a caching scheme at local servers (or even at browsers?). This
implies that document headers should provide sufficient information to make
this practical.

Servers need to be able to work out what documents to trash from their caches.
A simple approach is to compare the date the document was received with the
date it was originally created or last modified. Say it works out that when
you got the document it was already one week old. Then one rough rule of thumb
is to trash it after another week. You can be rather smarter if there is a
valid expiry date included with the document:

    o   the document header should *always* include a "Date:" field giving
        the date it was last written to

    o   the "Expires:" field is optional

    o   the date values should be in a prescribed format to simplify
        machine interpretation (Is this adequately defined by existing RFCs?)

I think that we need to provide an operation in which the server returns a
document only if it is later that a date/time supplied with  the request. If
it is the same (or earlier) the server should return a suitable status code
and an optional "Cost:" header, see below.

This is already provided for:

    GET udi
    SINCE datatime

The meaning of SINCE needs to change as in Tim's original description of HTTP
it means "greater than or equals", whereas I want strictly "greater than".

Note that servers shouln't cache documents with restricted readership since
each server don't know the restrictions to apply. This requires a further
header to identify such documents as being unsuitable for general caching:

    Distribution: restricted | unrestricted

This header is only needed for documents with restricted readership.
An dirty alternative would be to set the expiry date to the same value as
supplied with the "Date:" header.

Copyright & Payments
--------------------

Although the Internet backbone restricts profit making services, many subnets,
such as University campuses, and company subnets such as HP's have no such
problem. Indeed users strongly want access to copyrighted information for
which a payment is due.

My suggestion is that servers are responsible for tracking who accesses what
information, and hence how much they owe. For use within Hewlett Packard for
library services, we anticipate including some extra headers in the request:

    EmployeeNumber: 148689
    LocationCode:   8126        (an account number for cross charging)

This would be stripped off when sending requests to servers outside the HP
subnet. These headers are ignored by servers which conform to strict HTPP2.

I would like the document header to include an optional cost header, e.g.

    Cost: 4.05 US DOLLARS
    Copyright: Reuters Inc.

This would let the users know how much a given document has cost them, as well
as who owns the copyright. The latter heading is needed since you can't always
put it in the document, e.g. think of photographic images.


The "Cost: 4.03 US DOLLARS" field


Copyright and Caching
---------------------

What happens if a copyright protected document is saved in the cache of a
local server? We have got to ensure that the rightful owners get paid for
access even when the document is obtained from a local server's cache.

My idea is that for each access, this server should inform the server on which
the original document resides. This notification can however be deferred to
a time when the network is quiet ...

The notification proceeds using the "GOT" command

    GOT udi
    From: dsr@hplb.hpl.hp.com (David Raggett)
    Server: hplose.hpl.hp.com
    EmployeeNumber: 148689          /* HP specific */
    LocationCode:   8126            /* HP specific */

The From header gives the name of the user who requested the document. The
Server header names the machine with the cache, and other company specific
fields are used for accounting purposes.

Note that this scheme can't be used for documents with restricted readership,
since the server looking after the cache doesn't know who is and who isn't
allowed to read this document.

The protocol ought to allow for multiple GOT statements (and associated
headers in the same message. For this it seems simple enough to require a
terminating blank line.


Naming Parts of a Multipart body
--------------------------------

It would be nice to use the MIME format's capability to send multiple
documents as part of the same message, e.g. an HTML doc with several
pictures. To make this work each separate part needs to include the
Document Udi in its header, so that the browser can check if it has the
document in its local cache (history stack) or whether it needs to make
network request for the picture etc.

    DocumentName: Udi


Effective support for discussion groups
---------------------------------------

My model is that discussion groups each have unique Udi's. Each discussion
group has a sequence of base notes, and each base note is associated with a
sequence of responses. I am unsure of how to deal with cross postings!

Over a period of time the number of base notes can grow arbitrarily, and we
need a way of listing all those within a given time period. This can be
supported using Tim's BEFORE and SINCE modifiers in association with the GET
command. The server is responsible for creating an HTML document corresponding
to the list of base notes (and how many responses there are for each etc.).

In retrieving a base note, you should get an indication of how many responses
there are using the header:

    Responses:  16

You also need a way of retrieving a given response. One way is to ask for the
list of Udi's for all the responses, another is a command to get a particular
response given the Udi for the base note and a sequence number, e.g.

    RESPONSES Udi           /* request the list of response's as an html doc */
    GETRESPONSE 12 Udi      /* get a given response (12th in the list) */


What do people think?


I assume the POST command can be accompanied by an html doc as a body.

Looking forward to your comments,

David Raggett