File upload in HTML forms

Ernesto Nebel (nebel@xsoft.sd.xerox.com)
Fri, 23 Sep 94 16:41:22 EDT

The proposal below is for support of file upload in the World Wide Web. A few initiatives and custom examples have appeared on various WWW lists, but they have led to nothing. With this proposal, we hope to give this feature enough momentum to make some form of it a reality among the widely available WWW servers and browsers. Please keep the discussion in www-talk (to avoid 3 discussion threads). If you would like to see file upload become part of the WWW, please voice your support for that capability, even if you disagree with the specifics of this proposal.

Thank you.

Ernesto Nebel

nebel.sd@xerox.com
XSoft (a division of Xerox)

FILE TRANSMISSION FROM WORLD WIDE WEB BROWSERS TO SERVERS
---------------------------------------------------------

I. Introduction
---------------
Currently, a World Wide Web server can get information from users with
HTML forms. These forms have proven useful in a wide variety of
applications in which input from the user is necessary. But this
capability is still greatly limited because HTML forms don't provide a
way for the user to submit files to the server. Service providers who
need to get files from the user have had to implement custom browsers.
(Examples of these custom browsers have appeared on the www-talk
mailing list.) To avoid the necessity for custom browsers and to make
WWW servers complete in their ability to get information from the
user, the WWW needs to provide a way for users to send files to
servers. Since user information is sent back to the server using HTML
forms, it is most logical to extend HTML forms to support file
submission.

II. HTML forms with file submission
-----------------------------------
The HTML specification, version 2.0, dated July 1994 (by T.
Berners-Lee, D. Connoly, and K. Muldrow, locate at
http://www.hal.com/products/sw/olias/Build-html/html-spec.ps), defines
8 possible values for the attribute TYPE of an INPUT element: text,
password, checkbox, radio, submit, reset, image, and hidden. The
proposed change is to add a TYPE "file". The author of an HTML form
who wants to request one or more files from a user would simply write
(for example):

File to process: <INPUT NAME="userfile1" TYPE="file">

The change to the HTML DTD is trivial: just one item added to the
entity "InputType", as follows.

.. (other elements) ...

<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
RADIO | SUBMIT | RESET |
IMAGE | HIDDEN | FILE )">
<!ELEMENT INPUT - 0 EMPTY>
<!ATTLIST INPUT
TYPE %InputType TEXT
NAME CDATA #IMPLIED -- required for all but submit and reset
VALUE CDATA #IMPLIED
SRC %URI #IMPLIED -- for image inputs --
CHECKED (CHECKED) #IMPLIED
SIZE CDATA #IMPLIED --like NUMBERS,
but delimited with comma, not space
MAXLENGTH NUMBER #IMPLIED
ALIGN (top|middle|bottom) #IMPLIED
>

.. (other elements) ...

III. Proposed implementation
----------------------------
The proposed implementation in WWW browsers is for a browser to show a
text box with a "Browse" button next to it whenever it encounters an
INPUT tag of TYPE "file". The initial size of the text box can be
controlled by the SIZE attribute (SIZE=width,height). Pressing the
"Browse" button pops up a file selection window. The file or
directory currently selected in the file selection window gets added
to a new line of the text box anytime the user presses an "Add to
list" button on the file selection window. For convenience, the file
selection window remains on the screen until the user presses a
"Close" button. The user can manually add and remove files and
directories from the text box.

A WWW server may not be prepared to receive more than a certain number
of bytes, or it may want to check the number of files that the user
wants to submit. This proposal suggests sending only the file paths
and file sizes, and possibly some other information, to the server
when the user presses the SUBMIT button. This information can be url
encoded and sent to the server along with the rest of the form data.
(The browser should use one or more name/value pairs for each file to
avoid complications when multiple files are submitted. The initial
name/value pair corresponding to an INPUT TYPE="file" tag can contain
the number of file names being sent so that the server knows how many
name/value pairs to examine). For a directory, the browser sends the
path to the directory and the sum of the component sizes.

The CGI script receiving the form data at the server can check how
many files it will receive, it can check that it has enough storage
space for the files, and it can prepare a location for the storage of
the files. If the server decides not to accept the files, then the
CGI script responds with a text/html response containing an
appropriate message. If the server is ready to accept the files, it
responds with data of a new MIME type (for example:
application/x-sendfiles) that contains the paths to the files to send,
the full URL of the server, and other information relevant to the file
transfer (for example, file sizes or other data that can be used to
check if the files changed since the form was submitted). On
receiving data of this new MIME type, the browser pops open a window
displaying the file names of the files which will be sent to the
server. The user checks this list and confirms or cancels the file
transfer. Upon confirmation by the user, the browser compresses,
encodes, encrypts (in a secure version), and transmits the files to
the server using HTTP's POST. At the server, the receiving CGI script
decrypts, decodes, expands, and stores the files.

One advantage of this proposal is that, with a little effort from the
server, this implementation makes the new "file" TYPE backward
compatible. A browser which does not support this new "file" type
will just display a text entry field. The user can cut and paste
files from the file manager on his machine, or can type file paths,
into the text field. On submission of the form, the server analyzes
the name/value pairs and recognizes that it got a list of file names
in one text field instead of a set of name/value pairs with more
complete information about the files. The server can now send a
text/html response with an appropriate message, like "please ftp your
files to ... ", or "please install browser ... with file submission
support, version x.x, available at <ftp link to browser>".
Furthermore, the server can provide its own external viewer
application to send files from the client's machine to the server.
Before asking the user for files, this server asks the user if he's
using a browser which supports the "file" TYPE. If he's not, then the
server provides the user with a link to a trusted "sendfiles"
application and instructs the user to retrieve it, install it on his
machine, and configure his browser to use it as an external viewer.
For the user, this would be as easy as installing any other external
viewer of images, sound files, or whatever. Once this application is
installed, the browser responds to data of the new MIME type by
launching this external "sendfiles" application, which asks the user
to confirm transmission of the displayed files and then sends the
files to the server (after encoding, compression, etc.)

Another advantage of the proposed approach is that a server, if it
chooses, can delay the request for the files until a later form is
submitted. The server can store file information submitted with
several forms and respond with new forms until the user submits a
specific form. Then, the server responds with the new MIME type to
initiate the actual file transfer.

IV. Conclusion
--------------
The suggested implementation gives the client a lot of flexibility in
the number and types of files it can send to the server, it gives the
server control of the decision to accept the files, and it gives
servers a chance to interact with browsers which do not support INPUT
TYPE "file".

The change to the HTML DTD is very simple, but very powerful. It
enables a much greater variety of services to be implemented via the
World Wide Web than is currently possible due to the lack of a file
submission facility. This would be an extremely valuable addition to
the capabilities of the World Wide Web.