Re: Internet draft for 'file upload' feature proposal

Larry Masinter (masinter@parc.xerox.com)
Thu, 10 Nov 94 20:30:44 EST

I said:
>This internet draft is for consideration by the HTML working group for
>inclusion in HTML.

Dan said:
>In HTML 2.0? In a future revision?
>I don't think it's practical to include this in HTML 2.0.

I am not sure whether this is more appropriate for HTML 2.0, for some
future version of HTML 2.0, or for HTML 3.0. I'm not clear what the
current consensus of the group is on staging of levels vs. versions,
as I've seen conflicting opinions.

Given the fallback strategy outlined for server implementors, it
*might* be practical to include this in HTML 2.0.

I've been concerned that the HTML 2.0 specification is going to
proposed standard using "application/x-www-form-urlencoded" as the
default ENCTYPE for form data; it doesn't seem consistent to propose a
standard that uses a "x-" type.

The proposal says:
>File Transmission from WWW Browsers to Servers

Dan said:
> Does it bother anybody else that the term "WWW Browser" would be used
> in specifications when it has no well-defined meaning?

I think what I want to name is "the entity that interprets HTML forms
and allows clients to interact with it". This is not a "HTTP client".
Perhaps you could call it a "HTML client", but then, HTML isn't in
itself a client/server protocol, but merely an element by several of
them. As I said in response to Liam Quin's message, I could call this
a "WWW client", or a "HTML interpreter".

The draft says:

>1. Abstract
>
> Currently, a World-Wide Web server can get information from users
> with HTML forms. These forms have proven useful in a wide variety
> of applications in which input from the user is necessary. But this
> capability is still greatly limited because HTML forms don't provide
^^^^^^^^^^
> a way for the user to submit files to the server. Service providers
> who need to get files from the user have had to implement custom
> browsers.

Dan replied:

> I'd feel better if you said "WWW browsers don't provide a way... ," as
> there's nothing in HTML itself that prevents folks from doing file
> upload.

But this is not true. The lack is actually in HTML itself: there is no
way to write HTML that will cause the HTML interpreter to ask the user
for a file of data. There's a way to cause it to ask the user for some
text, for the user to select between multiple alternatives, etc.

The wording should more precisely say: "there is no way to write a
HTML form that will cause the HTML interpreter to ask the user for a
file."

It is in fact a 'lack' in HTML, and not merely a lack in the browsers.

I submitted this as an internet draft primarily as a way to make sure
that it was on the HTML working group agenda, and that it would have
wide distribution without people having to dig through the mail
archives to find it.

I'd be most happy if the proposal *didn't* actually become its own
RFC, but rather wound up as a revision to (the most appropriate
version of) the HTML specification.

Re "name:" header vs overloading some other header: I went back and
forth on this, before settling on using "name:" instead of trying to
overload some existing MIME header. As you pointed out yourself, the
existing headers have existing semantics, and what we want is a header
that is "this part of the multipart data stream corresponded to the
following named INPUT in the original form". RFC1521 doesn't specify
content-disposition, but promised a future RFC that would. Do you know
where it is actually defined?

In general, the 'original file name' might not be easily transmitted,
and I don't think it is required.

Other comments:

> Random generation is one way to implement boundary selection, and it
> has only probabalistic correctness. But the MIME multipart boundary
> spec has nothing probabalistic in it.

You're right; however, the argument that boundaries are as efficient
to generate as content-length relies on using a probabalistic
algorithm.

> I thought it was application/octet-stream. What's application/binary?

Yes, the proposal should say application/octet-stream where it
currently says application/binary.

> The
> "content-transfer-encoding" for each part should be "binary".

> Implicitly or explicitly? (Explicitly, I take it.) It might be clearer
> to say "each part should be given a content-transfer-encoding of binary."

I would actually prefer to make the default for
content-transfer-encoding depend on the context. If the ACTION is a
mailto:, it should correspond to the MIME default, while if the ACTION
is a http: URL, it could be binary.

> File inputs may optionally identify the file name using the
> "Content-Description" header.

> Bad idea. This is what Content-Disposition is for. Hmmm... now
> we've got Content-Disposition overloaded: does it give the filename
> or the form field name? I think Content-Disposition is more appropriate
> for the filename. I guess it's not reasonable to use Content-ID: for
> the form field name, since C-ID is supposed to be world-unique.

This file name isn't necessary; it's just for informational purposes.
In some cases (e.g., submitting a document with OLE links, where the
links are made by file names) it is very useful for the receiving
application to know the file names.

> Do you want to suggest that the form ACTION might be "mailto:..."? As
> long as an implementor is going to the trouble to implement multipart
> syntax and all, it would be a shame not to support mailto: forms.
> After all: how many http: forms are just CGI gateways to sendmail?

It seemed orthogonal to the proposal, but a good idea. I didn't want
to confound the two proposals.

(re: backward compatibility issue)
> Is this part just an elaborate suggestion, or a proposed specification?
> In other words, are we all supposed to implement x-please-send-files
> the same way, or are folks supposed to work out their own solution along
> these lines?

I'm not sure that I know what status it is. Those who want to
participate in this should all implement x-please-send-files the same
way. I thought the representation of the data in the body was
reasonably well specified, but I can work to make it more precise if
you can point out a way in which it is ambiguous.

> This business of compressing content-transfer-encodings was frowned
> upon in the MIME spec too. I would be greatful if somebody would do
> a little research and find out just why this is.

I think mainly because many people put compression in the higher-level
transport layer, and don't want to deal with double compression. (Or
triple, if you're transporting GIF or Group 4 compressed TIFF).

> I suggest you post this to comp.mail.mime for further review, as your
> use of "application/binary," Content-Description, and
> Content-Transfer-Ecoding seem a little out of whack with the
> conventional wisdom.

I'll fix the things you've pointed out, and do so. Thanks for your
comments.