Re: File upload from a Web browser

Larry Masinter (masinter@parc.xerox.com)
Thu, 13 Jul 1995 08:51:07 PDT

> http://www.ics.uci.edu/pub/ietf/html/draft-ietf-html-fileupload-02.txt

> It has been implemented on both the client side in Mosaic and server-side
> in the NCSA httpd as patches available somewhere at Xerox (Larry?). We
> have applications for this TODAY. Are any browser developers listening? :)

> If this is a chicken-or-egg situation, we could put this on the short
> list for apache....

> Brian

The patches for Mosaic are available at ftp://ftp.eit.com/pub/fileup/

You don't need to 'patch' your httpd, the server-side is all done with
CGI. The only problem with the files at EIT is that it implements the
-01 proposal, and the format of multipart/form-data changed between
-01 and -02 (because of concerns about fitting in with general
multipart/... rules in media types.)

================================================================
README.txt
11/22/94

This is an implementation of File Upload as described in the latest
ietf draft proposal by Larry Masinter and Ernesto Nebel. It
implements the INPUT TYPE=file and ENCTYPE=multipart/form-data.

The changes to the four NCSA Mosaic for X 2.4 source files are
described below. All changes are marked with an XW3FT comment.

libhtmlw/HTMLwidgets.c
libhtmlw/HTMLformat.c
These two files implement the widgets in Mosaic. Code was added to
HTMLwidgets.c to create the file input area, create a file browser
button to launch a file selection dialog box, and get the file values
out of the file widget. The file input area is simply a text field for
now. A few lines were added to libhtmlw/HTMLformat.c to make the
additional file browser button look like just another form widget for
management by Mosaic. This file widget should probably be just one
widget created in HTMLwidgets.c so no changes are necessary in
HTMLformat.c.

libwww2/HTTP.c
src/gui.c
These two files implement the new ENCTYPE. The new ENCTYPE is
recognized and used in src/gui.c. In standard Mosaic, the ENCTYPE tag
is ignored--except when Mosaic has been compiled for PEM security in
which case it is incorrectly used as the encryption type (HTML 2.0 spec
describes ENCTYPE as the encoding type). For multipart/form-data,
instead of building up the large in-memory buffer with the form data,
the form callback structure (cbdata) pointer is passed around. A
change in HTTP.c checks for a new encoding function being set. If it
has been, the encoding function is called with the form data buffer
along with a function pointer to the write function for streaming data
to the socket. For the new multipart/form-data, the form data is read
right out of the cbdata structure. When a file needs to be sent on the
stream, it is read out a buffer-full at a time and streamed to the
socket.

Two other supporting files are included with this implementation:

vquery.c
vquery.c is a simple CGI routine in C which has been modified to
understand multipart/form-data along with the traditional GET and
urlecoded POST. It simply finds boundaries in the MIME data and
streams file contents into temporary files. It echos the rest of the
MIME body back to check the encoding. It should really use a
generalized MIME parser and build up the name/value pairs of form data.
It can be built just like the query.c example that comes with the NCSA
httpd (uses helper functions in util.c for urlencoded data).

filer.html
filer.html is a simple form with file fields. It can be used to test
the file upload capabilities. This form looks like:
<FORM ACTION="/cgi-bin/vquery" METHOD=POST ENCTYPE="multipart/form-data">
Image File: <INPUT NAME="image" TYPE=file ACCEPT="image/gif"> <BR>
<INPUT TYPE=submit VALUE="Transfer Files">
</FORM>

To build an enhanced Mosaic for X with file upload:
- Get a copy of the Mosaic 2.4 source tree
- Update the 4 source files with the diffs
- Recompile Mosaic
This has been compiled and tested on SunOS 4.X.

This implementation addresses:
- support for the INPUT TYPE=file
- ENCTYPE=multipart/form-data
- ACCEPT types listed with INPUT TYPE=file
- multiple files listed in a single file input area (comma separated)

Problems:
- The multiple file support (comma separated files in the file input
area) just shows what the encoding should look like. To fully support
it, an additional button should be added to the file selection dialog
box to add a file to the file input area rather than always
overwritting it. Perhaps the file input area should be a multiline
text area rather than a single line text field for multiple files. Or
maybe the file widget should look entirely different than it does in
this prototype. More elaborate wildcard expansion may also want to be
added.
- Only the first ACCEPT type is used for file filtering. The comma
separated list is read in and expanded, but since the stock file
selection dialog box can only take one filter pattern, only the first
ACCEPT type is use. Multiple filters could easily be added by creating
a custom file search procedure for the file selection dialog box.
- Right now, the globe twirls while sending the multipart/form-data to
the server, but the connection can not be interupted by pressing the
globe. This is mainly since I can't decide what should happen. By
that point, the tcp connection has been made, the HTTP headers have
been sent and parsed, the server has launched the CGI program to
interpret the data, etc. What should an abort look like?
- The input cursor does not display in the file selection dialog box.
You can type in the fields, there is just no indication of where the
cursor is. This is probably due to the strange input focus management
of Mosaic. It would probably go away if the file selection dialog box
was created and managed like the open_local_window dialog box for local
file browsing.
- As part of the implementation, a generalized cgi function should be
provided to parse the multipart/form-data and create the list of
name/value pairs. Right now, vquery.c only checks for boundaries and
streams out files. A more general MIME parsing engine should be used
to break down the multipart/form-data.
- All changes have been made to the MOTIF fork only. Athena support
should be similar.
- The original implementation set -1 as the HTTP content_length. This
does not work through our proxy server. The proxy server seems to
use content_length to know when it has read all of the data from the
client. For POST transactions, this data could be variable size so
the proxy should key off some other factor. To support existing
proxy servers, code has been added (if XW3FT_PROXY_CALC_LEN is defined)
to compute the real length of the POST data before it is sent. If this
value is too low, then the proxy will not send all of the data from
the client to the server, causing the CGI gateway to wait when trying
to find the MIME end of the POST data. If the content_length is too
high, then the proxy server waits for the client to send it the rest
of the data. When set correctly, then the POST data properly makes it
through the proxy to the server. Calculating the data size works fine
for static files, but if the files change in length between the time
the size is calculated and the time the data is sent, then the proxy
will hang. This is probably just a general problem of proxy servers
and POST data.

Send any questions or comments to rpotter@xsoft.xerox.com about this
implementation. We hope this prototype will speed the acceptance of
file upload into many more browsers on the Web.

Rob Potter
XSoft, Xerox Corporation
rpotter@xsoft.xerox.com