Re: Internet draft for 'file upload' feature proposal

Daniel W. Connolly (connolly@hal.com)
Thu, 10 Nov 94 19:17:05 EST

In message <94Nov10.135022pst.2760@golden.parc.xerox.com>, Larry Masinter write
s:
>This internet draft is for consideration by the HTML working group for
>inclusion in HTML.

In HTML 2.0? In a future revision?

I don't think it's practical to include this in HTML 2.0.

>INTERNET-DRAFT E. Nebel
>File Transmission from WWW Browsers to Servers L. Masinter

Does it bother anybody else that the term "WWW Browser" would be used
in specifications when it has no well-defined meaning?

The term "HTTP Client," on the other hand, is reasonably well-defined.

>1. Abstract
>
> Currently, a World-Wide Web server can get information from users
> with HTML forms. These forms have proven useful in a wide variety
> of applications in which input from the user is necessary. But this
> capability is still greatly limited because HTML forms don't provide
^^^^^^^^^^

I'd feel better if you said "WWW browsers don't provide a way... ," as
there's nothing in HTML itself that prevents folks from doing file
upload.

> a way for the user to submit files to the server. Service providers
> who need to get files from the user have had to implement custom
> browsers.

Here you say yourself that the problem can be solved with
application-specific browsers.

> (Examples of these custom browsers have appeared on the
> www-talk mailing list.) To avoid the necessity for custom browsers
> and to make WWW servers complete in their ability to get information
> from the user, the WWW needs to provide a way for users to send files
> to servers. Since user information is sent back to the server using
> HTML forms, it is most logical to extend HTML forms to support file
> submission.

I'd suggest:

Since file-upload is a feature that many applications will
benefit from, we propose an extension to HTML to allow all
information providers to express file upload requests uniformly,
and a MIME compatible (and hence HTTP and SMTP compatible)
representation for file upload responses.

> The current draft HTML specification <URL:http://www.hal.com/
> %7Fconnolly/html-spec/spyglass-19941014/html-19941014.txt.Z>

Eric: could you send the 19941014 HTML draft to the internet draft
editor?

> In addition, it defines the default ENCTYPE attribute of the FORM
> element using the POST METHOD to have the default type
^^^^
> "application/x-www-form-urlencoded".

the SGML-happy term is "default value."

> an INPUT element might usefully have an
> attribute which identifies a set of acceptable media-types, e.g.,
> <INPUT TYPE=file ACCEPT="image/gif, image/tiff" NAME="image1">.

I think the ACCEPT attribute is a good idea, unless you think that it
will not be widely implemented and hence confidence in the spec will
suffer.

>3. Proposed implementation
>
> The proposed implementation in WWW browsers is, when a INPUT tag of
^^^^^^^^^^^^

"WWW browsers" again. Hmmm... that term needs a definition somewhere.
Or else use "HTTP client." Hmmm.. "user agent" is the term use in
the mail specs. Hmmm...

> In such a file selection dialog,
> the user would have the option of replacing a current selection,
> adding a new file selection, etc. Browser implementors might choose
> let the list of file names be manually edited.

The file browser might filter out files disallowed by the ACCEPT attribute.

> When the user completes the form, and selects the SUBMIT element, the
> browser should send the form data and the content of the selected
> files. The encoding type "application/x-www-form-urlencoded" is
> inefficient for efficiently sending large quantities of binary data.
^^^^^^^^^^^^^^^^^^^^^^^^^^^

easy for you to say!

> Thus, a (new) media type, "multipart/www-form-data" is proposed as a
> way of efficiently sending the values associated with a filled-out
> form from client to server.

Hmmm... www-form-data. What's a "www form"? Should it be html-form-data?
Perhaps http-form-data? I guess www-form-data is OK. But hmmm...

> The media-type (MIME-type) "multipart/www-form-data" follows the
> rules of all multipart MIME data streams as outlined in RFC 1521--a
> boundary is selected that does not occur (with more than
^^^^^^^^^^^^^^
> infinitessimal probability) in any of the data. Each field of the
^^^^^^^^^^^^^^^^^^^^^^^^^^

This is not part of the spec. Random generation is one way to implement
boundary selection, and it has only probabalistic correctness. But the
MIME multipart boundary spec has nothing probabalistic in it.

> the order in which it occurs in the form, as a part of the multipart
> stream. Each part identifies the INPUT name within the original
> HTML form using a "Name: " attribute. Each part has an optional
^^^^^^^^^^^^^^^^^^

you mean a Name: header? Wouldn't it be better to use the
Content-Disposition: header? I'd have to dig up the most recent
internet draft for Content-Disposition, but I'm pretty sure that's the
thing to use here.

> Content-Type (which defaults to text/plain). File inputs should be
> identified as either application/binary or the appropriate media
^^^^^^^^^^^^^^^^^^

I thought it was application/octet-stream. What's application/binary?

> The
> "content-transfer-encoding" for each part should be "binary".

Implicitly or explicitly? (Explicitly, I take it.) It might be clearer
to say "each part should be given a content-transfer-encoding of binary."

> File inputs may optionally identify the file name using the
> "Content-Description" header.

Bad idea. This is what Content-Disposition is for. Hmmm... now
we've got Content-Disposition overloaded: does it give the filename
or the form field name? I think Content-Disposition is more appropriate
for the filename. I guess it's not reasonable to use Content-ID: for
the form field name, since C-ID is supposed to be world-unique.

> the content-length is not intended as a replacement for
> the multipart boundary as a way of detecting the end of an
> individual component; rather, it is just as a way forwarning the
> server of the amount of data coming.

You might want to say that a little louder. Put it in a NOTE: or
something.

> On the server end, the ACTION might point to a HTTP URL that
> implements the forms action via CGI. In such a case, the CGI program
> would note that the content-type is multipart/www-form-data, parse the
> various fields (checking for validity, writing the file data to local
> files for subsequent processing, etc.).

Do you want to suggest that the form ACTION might be "mailto:..."? As
long as an implementor is going to the trouble to implement multipart
syntax and all, it would be a shame not to support mailto: forms.
After all: how many http: forms are just CGI gateways to sendmail?

>4. Backward compatibility issues
>
> In this case, the browser needs to be configured to process
> application/x-please-send-files to launch a helper application.

Is this part just an elaborate suggestion, or a proposed specification?
In other words, are we all supposed to implement x-please-send-files
the same way, or are folks supposed to work out their own solution along
these lines?

As an elaborate suggestion, it's sufficient. As a proposed specification,
the x- is no good, and the representation of the data in the please-send-files
body should be described more precisely.

>5. Other considerations
>
>Compression:
>
> It might be possible for browsers to
> optionally produce a content-transfer-encoding of x-compress for
> file data, and for servers to decompress the data before processing,
> if desired; this was left out of the proposal, however.

This business of compressing content-transfer-encodings was frowned
upon in the MIME spec too. I would be greatful if somebody would do
a little research and find out just why this is.

While you're at it, you should probably discuss encryption. Not
applications will want to use encryption, but I can imaging many that
will.

I suppose you could punt and say that privacy gets addressed by some
HTTP level mechanism, like CommerceNet's SHTTP.

>Deferred file transmission:
>
> In some situations, it might be advisable to have the server validate
> various elements of the clients data (user name, account, etc.)
> before actually preparing to receive the data. However, after some
> consideration, it seemed best to require that servers that wish to do
> this should implement this as a series of forms, where some of the
> data elements that were previously validated might be sent back to
> the client as 'hidden' fields.

Note that this "series of forms" mechanism can't be used, for example,
to check the size of a file before transmission (except by having the
user find out the size of the file manually.)

You might also mention Content-Length again, and encourage HTTP
clients to give this info to the server. A busy server could look at
the Content-Length, and if it's way to hight, it could just spit
out an error code and close the connection without waiting around
to process all that data.

>Other choices for return transmission of binary data:
>
> On the other hand, the
> 'multipart' mechanisms are well established, trivial to implement on
^^^^^^^^^^^^^^^^^^^^

I don't think it serves any purpose to have those three words in there.

The other benefit of using multipart is that it applies to mailto: forms
as well as HTTP forms.

>6. Examples
>
> Suppose the server supplies the following HTML:
>
> <FORM ACTION="http://server.dom/cgi/handle">
> What is your name? <INPUT TYPE=TEXT NAME=submitter>
> What files are you sending? <INPUT TYPE=FILE NAME=pics>
> </FORM>

Don't you need to give ENCTYPE="application/www-form-data" explicitly?

> The client would send back the following data:
>
[...[
> --BbC04y
> Content-Description: file2.gif
> Content-type: image/gif
>
> ...contents of file2.gif...
> --BbC04y--

Hmmm... no explicit content-transfer-encoding of binary. I think that
conflicts with the multipart rules. And I don't think it's a good
idea to say "the default content-transfer encoding inside www-form-data
is binary," as it would require gateways to scan these bodies and insert
content-transfer-encodings. These gateways would have to have special
knowledge of www-form-data. Bad idea.

>7. Conclusion
>
> The suggested implementation gives the client a lot of flexibility in
> the number and types of files it can send to the server, it gives the
> server control of the decision to accept the files, and it gives
> servers a chance to interact with browsers which do not support INPUT
> TYPE "file".
>
> The change to the HTML DTD is very simple, but very powerful. It
> enables a much greater variety of services to be implemented via the
> World-Wide Web than is currently possible due to the lack of a file
> submission facility. This would be an extremely valuable addition to
> the capabilities of the World-Wide Web.

Agreed.

Just one more thing:

I suggest you post this to comp.mail.mime for further review, as your
use of "application/binary," Content-Description, and
Content-Transfer-Ecoding seem a little out of whack with the
conventional wisdom.

Dan