Re: Multipart/mixed for many inline images (was Re: Toward Closure on HTML)

Marc VanHeyningen <mvanheyn@cs.indiana.edu>

Mail folder: WWW Talk Apr 94-present
Next message: Marc VanHeyningen: "Multiple objects in a single transaction: Making it more concrete"
Previous message: Steve Putz: "who is using images from the PARC Map Viewer?"
Maybe in reply to: Alan Emtage: "Re: Multipart/mixed for many inline images (was Re: Toward Closure on HTML)"

Errors-To: listmaster@www0.cern.ch
Date: Sun, 10 Apr 1994 01:31:16 --100
Message-id: <23112.765933990@hound.cs.indiana.edu>
Errors-To: listmaster@www0.cern.ch
Reply-To: mvanheyn@cs.indiana.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Marc VanHeyningen <mvanheyn@cs.indiana.edu>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Multipart/mixed for many inline images (was Re: Toward Closure on HTML)
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Organization: Computer Science Dept, Indiana University
Organization: Computer Science Dept, Indiana University
Content-Length: 6795

>>> The browser can tell the difference by parsing the link.
>
>>Did you get a reply to this?
>
>Not really. The current stuff about the DTD seems to have somewhat swamped my
>suggestion.
>
>>The browsers can't tell what type a URL is by parsing the link.
>
>Could you elaborate? It seems to me that they can. If a link is of the form
>
>  http://site/path/blah.html     its an html file
>  http://site/path/blah.tif      its an image, etc.
>
>Sure if someone defined *.gif to be text/html in their servers list of mime
>types that might be a problem, but that would be perverse in the extreme. How
>many html pages are there on the web, and what percentage of them do not have a
>file type of .html or .htm (for broken DOS systems ;-0 )

Filename extensions provide hints at file contents in environments
where nothing else is available (e.g. FTP, HTTP/0.9) but should NOT be
used except in places where nothing else is available.

The fact that some servers use filename extensions as the mechanism
for authors to inform the server what the file type is is totally
separate from how servers and clients tell one another about it.  It's
just a temporary accident.  Filenames on, say, a Macintosh are
unlikely to tend to end with .html or .gif suffixes since the file
type information is stored elsewhere.

In the long run (IMHO) suffixes will be dropped from URLs as servers
get smart enough to know that a fetch for "foo" should return foo.html
(or foo.txt or foo.gif or foo.ps.gz or whatever form of foo is agreed
upon by standard content-type negotiation.)

In brief, don't do this.

>Marc suggested that my suggestion wasted already cached images, which is true;
>but then if they are all retrieved in a block the inefficiency is not that
>great.

This is still based on the assumption that the embedded objects are
small, the overhead for setting up the connection is large, and the
cost of transfering bytes is low.  These are not necessarily globally
applicable assumptions; they surely don't apply to SLIP/PPP
connections with slow modems, for instance.

As some others have pointed out lately, HTTP isn't even necessarily
strictly bound to TCP/IP; its simple request-response model could work
in other paradigms.

>An alternative would be to have the browser construct a list of images in the
>current document that are not in the browsers cache; however extending the
>syntax of a GET to allow lists seems a bigger task. Could be the way to go,
>though.

I believe something like this is the way to go.  Bear in mind that a
simple list of multipart/mixed with image/gif types will not be
sufficient for this syntax.  We need a solution that allows for a
situation like:

- Client fetches document.  It contains 6 embedded objects; let's
  assume they are all images (although there's no general reason to
  assume they couldn't be something else.)
- Client already has two of them in its cache, and just uses the
  cached versions.
- For simplicity, let's assume the other 4 are all from the same HTTP
  server although there's no reason to assume that would always be the
  case.
- Client requests those images.
  o One of them transfers normally.
  o One of them is cached, but the cached version has expired so the
    client uses the "If-modified-since" (or whatever it's being
    called) header to indicate that the document should only be
    transferred if it is newer than that date.  It turns out the
    cached version is still current, so the server just returns the
    appropriate code.
  o One of them has been moved, and the server returns a 301 redirect which
    the client must resolve.
  o One of them requires authorization, so the server returns the
    appropriate error code and the client may attempt an authorized
    transaction.

In other words, each of the items returned bundled up into a MIME
multipart structure must be not just a media type but the full server
response including headers.  I suggested this bundled object be called
message/http-response, along with message/http-request, a long time
ago but nobody seemed much interested.  I can dig up the reasons why
"message" is the appropriate primary content-type if people want to
argue about it.

If the server is capable of sending several objects at once, the way
it would work would be to send a multipart/mixed message consisting of
many parts, each of them a message/http-response (or, if you want to
get carried away, you could create a multipart/http-digest which
implicitly typed each of its parts this way.)

The question then becomes one of how the client should express to the
server the idea of doing multiple things at once.  Here are some
random thoughts...

- Extend GET with something (say, MGET?) which allows arbitrarily many
  URLs on the first line.

  Presumably this would not work for existing servers (a casual test
  suggests it would work for Plexus and NCSA but not CERN) so a client
  would have to use a transaction to test whether it would work,
  caching the results so it doesn't have to test all the time.  It
  provides no way for handling the situation in which the request
  headers (e.g. the "If-modified-since" or others) are not the same
  for all the various items requested.

  For that matter, it's not clear to me how this could resolve issues
  like binding which of the responses goes with which of the requests.
  Simple ordering could work, but seems a bit brittle.

- Add some headers in the request indicating additional documents to
  fetch (i.e. you tell the server "Give me this, and if you can also
  maybe give me this and this and this")

  This would work with existing servers.  Otherwise it has nothing to
  recommend it.  It has the same limitations as the above.

- Have the client transmit to the server a document of type, say,
  multipart/mixed with many parts, each of them a
  message/http-request.  It's not clear to me what the method for this
  should be; POST, or maybe even some new one, seems appropriate.
  Each of the embedded requests would presumably be a GET, although
  there's no fundamental reason it couldn't be something else.

  This would not work for existing servers.  However, it is the only
  solution that properly permits different headers to be transmitted
  for each bundled request.

I believe the last of these options is the only viable way to keep
HTTP request/response but permit multiple requests to be transmitted
in a single connection.  Any solution that does not allow for the all
the different cases of the images discussed in the example above is,
IMHO, a non-starter.

And changing HTTP to be something other than request/response is a
non-starter to me, but that's a deeper issue.

Ready for the flames,
- Marc
--
Marc VanHeyningen  mvanheyn@cs.indiana.edu  MIME, RIPEM & HTTP spoken here