Re: Form-based File Upload in HTML

Larry Masinter (masinter@parc.xerox.com)
Thu, 26 Jan 95 18:01:44 EST

(Reply to new comments on the File Upload proposal)

Proposal:

> 3. Suggested implementation

> If an ACCEPT attribute is present, the browser might constrain the
> file patterns prompted for to match those with the corresponding
> appropriate file extensions for the platform.

Bruce:

> Too restricitve and platform specific. Mac's dont use extensions or the like
> for 'typing' but rather resourced attributes. Relying on extension mapping
> or partial name matching (ie: "*config*sys*") assumes TOO much about what the
> clients platform and naming habits will be for a user or platform.

response:

Yes, this suggested implementation is really for Unix and Windows.
The suggested implementation for Macs is to look up the file type &
creator patterns that are consistent with that MIME type and use a
file dialog box that only prompts for those types.

This part of the proposal should be updated to make sure that's clear.

================================================================
Proposal:

> * The (fully qualified) URL to which the actual form data should
> be posted (terminated with CRLF)
> * The list of field names that were supposed to be file contents
> (space separated, terminated with CRLF)
> * The entire original application/x-www-form-urlencoded form data
> as originally sent from client to server.
> ...
> The helper would read the form data, note which fields contained
> 'local file names' that needed to be replaced with their data
> content, might itself prompt the user for changing or adding to the
> list of files available, and then repackage the data & file contents
> in multipart/form-data for retransmission back to the server.

Bruce:

> Wont the helper app need to know the attributes for the fields (ie:
> MAXLENGTH, SIZE, etc)? Is there a need to somehow indicate that a file name
> is required rather than 'optional' (otherwise the cycle of
> server->helper->server->helper.. will continue indefinitely for cases of no
> filename or bogus files)? Actually, now that I think about it, a REQUIRED
> attribute for the FILE type may be very useful for many reasons... See below
> for talk of a new MAXSIZE attribute too.

response:

I didn't see why the helper would need to know anything other than the
names of the fields that it was supposed to supply files for. I don't
understand the concern.

================================================================
Proposal:

> 5.1 Compression, encryption

> layer might not be appropriate. It might be possible for browsers to
> optionally produce a content-transfer-encoding of x-compress for
> file data, and for servers to decompress the data before processing,
> if desired; this was left out of the proposal, however.

> Similarly, the proposal does not contain a mechanism for encryption
> of the data; this should be handled by whatever other mechanisms are
> in place for secure transmission of data, whether via secure HTTP or
> mail.

Bruce:

> I would suggest you duplicate the last paragraphs intent over the one above
> it and leave it at that (as far as compression goes).

response:

Well, this whole section is just 'suggestions we're not going to
address now but could if anyone thought it was important.' As such, I
don't know why we should cut it down.

================================================================
Proposal:

> 5.2 Deferred file transmission

> In some situations, it might be advisable to have the server
> validate various elements of the form data (user name, account,
> etc.) before actually preparing to receive the data. However,
> after some consideration, it seemed best to require that servers
> that wish to do this should implement this as a series of forms,
> where some of the data elements that were previously validated might
> be sent back to the client as 'hidden' fields, or by arranging the
> form so that the elements that need validation occur first.

Bruce:

> Im having a tough time trying to think of a concrete example/scenario where
> this would be an issue. If any validation needs to be done then it can be
> done using the WWW-Authenticate: header. Since there is no way to coerce a
> browser to send fields back in any particular order (currently they send them
> back in the same order but NOTHING says they have to...), one should not rely
> on field ordering to achieve some form of validation of inputs (ie: Account
> #, etc). All deferred file transmission seems to do is make a double (or
> tripple) transaction out of what would normally be a single one and Im not
> sure I agree w/the need for it (yet).

response:

This was "validation" in terms of "The credit card number requires 12
digits". You need to be careful if the same form ask for a file and a
credit card number, because sending the file takes a long time, and
you should check the credit card number before you start accepting the
whole file. I think you took "validation" to mean something else.
Right?

================================================================

Proposal:

> Clients are encouraged to supply content-length for overall
> file input so that a busy server could detect if the proposed file
> data is too large to be processed reasonably and just return an
> error code and close the connection without waiting to process all
> of the incoming data.

Bruce:

> Why not make it a requirement that the content length be sent rather than
> optional? Since this is a new addition/extension it should have as many
> useful features as possible to make it desireable. (It cant be that hard to
> find the original file size can it??)

response:

There are some situations where file data is being built on the fly,
(e.g., being converted from another format) or is otherwise not easily
determinable. It is unreasonable to require it in those situations.
If the length is easily available, send it, otherwise don't.

================================================================
Proposal:

> If the INPUT tag includes the attribute MAXLENGTH, the user agent
> should consider its value to represent the maximum Content-Length
> (in bytes) which the server will accept for transferred files. In
> this way, servers can hint to the client how much space they have
> available for a file upload, before that upload takes place.

Bruce:

> I think that MAXLENGTH should not be overloaded in its meaning. Allow it to
> remain the max displayable length and add a new attribute that is only valid
> for TYPE=FILE, MAXSIZE. This attribute would do what you have described
> above. On older browsers, it would be ignored and the
> application/x-please-send-files scenario would happen. Since the size would
> be an issue for them, there needs to be some way to tell the helper app that
> a server has a size restriction on the submission (and this MAXSIZE can be s
> et by the CGI so its value can be more than a hint...).

response:

I think you're right. It's a little awkward to go adding more
attributes, but I think using the same attribute to mean different
things is probably a mistake.

================================================================
Prposal:

> 5.6 Interpretation of other attributes

> The VALUE attribute might be used with <INPUT TYPE=file> tags for
> a default file name. This use is probably platform dependent,
> however, and probably should be avoided.

Bruce:

> Two ways to deal w/this issue: Allow any value to be specified with a strong
> warning to avoid any kind of platform specific information (like DOS paths or
> Macintosh hierarchys) or declare the attribute unusable with the attribute
> TYPE=file. The former is useful for servers to 'suggest' file names to users
> but the latter is more Draconian and may not be possible in the DTD... I opt
> for the latter since it would make browsers too difficult to write cleanly
> (having to grok Mac paths, DOS paths and Unix paths for any possible value).
> Perhaps adding a sentence suggesting that to send a fully qualified pathname
> along, use a separate text field (for the servers reference?) would be one
> way for the full pathname to be sent to the server w/o affecting the ability
> to grok the file's name.

> If you decided to allow FQPNs for a FILE type, you may want to have the
> browser at least send back the hierarchy separator in the encoded response
> (ie:
> Name: pics
> Content-type: multipart/mixed, boundary=BbC04y
> Content-file-separator: "/"

> --BbC04y
> Content-Description: /tmp/file1.txt
> Content-Type: text/plain
> Content-Transfer-Encoding: binary

> ... contents of file1.txt ...
> --BbC04y
> Content-Description: /usr/foo/file2.gif
> Content-type: image/gif
> Content-Transfer-Encoding: binary
>
> ...contents of file2.gif...
> --BbC04y--
> ). This would allow a server to at least parse the full pathname and suck
> off the files name (file = strrchr(fqpn, separator)). This would also be a
> _big_ benefit to any helper apps that have to know where file "readme.txt" is
> to be found on a users 5 disks (each 250+ Megs each)... This value would
> tell the helper app exactly which disk/subdirectory to use and what the user
> originally entered (in their browsers TYPE=TEXT field).

response:

I don't like "content-file-separator", and besides, there are
operating systems for which it just wouldn't work.

================================================================

Proposal:

> The SIZE attribute should probably not be used with <INPUT
> TYPE=file> tags. For text input, it indicates the length in
> characters for the text area for the prompt.

Bruce:

> There is no reason _not_ to allow the SIZE attribute since it can be defined
> as "For file names, it indicates the length in characters for the file name
> _area_ for the prompt.". This will give FORMs designers some flexibility in
> designing the look of a form.

response:

Hmmm, there was some confusion over whether SIZE meant file-name-size
or file-size, which is what caused us to disrecommend it. It's not at
all clear to me that forms designers should expect any leeway in
deciding what the right 'length' of a file name is for different
platforms: mac and windows and unix and magic cap are all different.
================================================================
Proposal:

> 5.9 Remote files with third-party transfer

> In some scenarios, the user operating the client software might want
> to specify a URL for remote data rather than a local file. In this
> case, is there a way to allow the browser to send to the client a
> pointer to the external data rather than the entire contents? This
> capability could be implemented, for example, by having the client
> send to the server data of type "message/external-body" with
> "access-type" set to, say, "uri", and the URL of the remote data in
> the body of the message.

Bruce:

> Could an example of this kind of response be added to 6. Examples (since
> there is only one example)? It would be helpful for readers like myself who
> are not 100% MIME fluent.

response:

I didn't want to load up examples with hypotheticals that no-one
actually implemented. I'd like to hear from browser implementors who
have either tried or at least thought about implementing this proposal
in their browser.

================================================================
Bruce:

> Also, what is the behavior for when ENCTYPE is multipart/form-data but there
> are no <INPUT TYPE=FILE> fields? Id venture to suggest that its as spec'd in
> your draft but...

response:

Yes, that's intended. The proposal could clarify that.

================================================================
Bruce:

> All in all I like the idea and hope that some serious consideration is given
> to it on the HTMLWG soon.

response:

Me too. We got feedback that this wasn't for 2.0, but now that the
conversation seems to be moving on beyond 2.0, I'd like to see this
addressed again.

================================================================
> Bruce
> INet: Bruce_Kahn@iris.com

Larry
masinter@parc.xerox.com