Re: comments? MIME types for HTTP

Keith Moore <moore@cs.utk.edu>
Message-id: <9308130446.AA00931@thud.cs.utk.edu>
From: Keith Moore <moore@cs.utk.edu>
To: sanders@bsdi.com
Cc: www-talk@nxoc01.cern.ch, moore@cs.utk.edu
Subject: Re: comments? MIME types for HTTP 
In-reply-to: Your message of "Wed, 11 Aug 1993 16:33:38 CDT."
             <9308112133.AA10908@austin.BSDI.COM> 
Date: Fri, 13 Aug 1993 00:46:19 -0400
Sender: moore@cs.utk.edu
Status: RO
To:  www-talk@nxoc01.cern.ch
Subject:  comments? MIME types for HTTP
Date:  Wed, 11 Aug 1993 16:33:38 -0500

> Here is a rough draft of a proposal for various MIME type information
> to be used with HTTP.  Any comments?  Hopefully from someone that knows
> the MIME standard pretty well and point out conflits (though some are
> intentional where I think MIME is broken).

It might help if you said where the conflicts are intentional... :-)

> Here are a couple of examples of what it would look like:
> 
> Request:
>   Accept: image/gif; class=color; depth=8;
>           width=1024; height=768; xdpi=85; ydpi=85


While it might not be a bad idea to define such parameters for GIF (arguably
the rfc 1341 definition isn't detailed enough), but if they are to be used,
they should be written up in an RFC.  (normally an RFC wouldn't be
necessary, but since there's already an RFC that claims to document
image/gif, people would reasonably expect that a revised image/gif spec
would be in an rfc also.)

> Complex return type:
>   Content-Type: archive/tar; name="foo.tar"; encoding="compress, uuencode"


However, the meaning of each parameter (such as "encoding") is specific to a
particular MIME content-type.   So you shouldn't just tack on an "encoding"
parameter to an arbitrary type and expect it to have meaning.  Likewise,
readers should not interpret an encoding parameter without knowing what it
means for that particular type.

(For the moment I won't argue against use of "archive/*".  Actually, I might
end up agreeing that this is one way that MIME-as-used-in-WWW should differ
from MIME-as-used-in-email.)


> Types marked with a `*' are standard MIME types: <P>
> 
> Content-Type:                       		Description
>  application/dvi 		dvi		TeX DVI
>  application/latex 		latex		LaTeX Source
>  application/tex 		tex		TeX Source

There are security problems with both of these types that implementors need
to be made aware of.  In particular, all of these kinds of files can contain
\specials which could be used to exploit holes in PostScript; tex and latex
files can (and often do) scribble on your file system when processed.

In my experience, these are often not very interoperable.  dvi files need
particular fonts that aren't always installed; TeX and LaTeX often refer to
external macro packages and style files.  (This isn't an argument against
defining such types, only that you might want to consider adding additional
parameters stating what additional files/fonts are required, and what
classes of \specials need to be supported.)

>  application/texinfo 		texi		Texinfo

Any file which might be processed by emacs can exploit the emacs feature
that allows it to define a LISP procedure to be invoked when the file is
loaded.  (Of course any text file can be read with emacs, but this is the
normal thing to do with texinfo.)

>  application/troff		roff		Troff
>  application/troff-ms		ms		Troff w/MS Macros
>  application/troff-me		me		Troff w/ME Macros
>  application/troff-man		man		Troff w/MAN Macros

Troff lets you scribble on files and run arbitrary programs.  Also, using
the transparent output facility, you can exploit holes in PostScript
interpreters.

>  archive/bcpio			bcpio		Old Binary CPIO
(etc.)

Any of the archive formats is dangerous if they can contain absolute
pathnames.  Implementations used in WWW browsers should forbid extraction of
archive members with absolute pathnames and (maybe) require the recipient to
extract in an empty directory.

...

In general, you need to register all of the new names as MIME types unless
they begin with "X-".  Otherwise, MIME types for some of these will surely
be defined, and we will have to define type equivalences for WWW<->MIME.
(I fully expect there to be WWW-to-MIME gateways of various flavors.)

> Common Attributes:
>
>
>  q		= quality factor (float between 0 and 1 inclusive)
>  mxb		= max transmission bytes
>  mxt		= max transmission time in seconds
>  name		= document name (this is just a hint for save diaglogs)
>  type		= data type (mostly used with application/octet-stream)
>  charset	= e.g., US-ASCII (ISOXXXX)
>  language	= per ISO standard: e.g., en_US (ISO3316/ISO639)
>  encoding	=	    	Description
> 	hqx 	 	hqx	Mac Compressed
> 	gzip 	 	z	Gnu Zip Compressed
> 	zip 	 	zip	gzip Compressed
> 	compress	Z	compress'ed
> 	uuencode	uu	uuencode'ed
> 	btoa		btoa	btoa'ed

You can define these, separately, for each of your own types, but not for
MIME types that are already defined.

Alternatively, perhaps for WWW use you could define special X-WWW-*
parameters that include things like X-WWW-encoding.  Then a gateway would
know not to copy these parameters to a MIME-for-email document, without
having to know which parameters are really valid for which objects.

(Even better to have HTTP servers undo the encodings and transmit everything
as canonical-form octet-streams (or maybe with a single optional compression
algorithm).  Then the burden of supporting all of this stuff is shifted from
the client to the server, and the server administrator can choose to support
the encodings that his server uses.)

The next version of the MIME spec will delete the "name" parameter from
application/octet-stream, in favor of a content-disposition header (the spec
for which is currently an internet-draft).

> Extended attributes for all image and video types:
>  class	= gray / color
>  depth	= bitplanes (commonly 1, 4, 8, 16, 24)
>  width	= pixels
>  height	= pixels
>  xdpi	= x dpi
>  ydpi	= y dpi
> 
> Extended attributes for audio types:
>  srates	= sample rates (in Hz, e.g., srates="8000,22000")
>  widths	= sample sizes (in bits, e.g., "widths="1,8,16")

same considerations.  In particular, it would be wrong to define a sample
rate for audio/basic...which is  *deliberately* fixed.  Better to define an
audio/general type and appropriate parameters.

In general, it is a bad idea for parameters to duplicate information that is
already in the content, UNLESS such information is necessary for a reader to
determine whether it can successfully read the object.  (I've seen readers
that can't handle (e.g.) JPEG greyscale, but I haven't seen one that can't
handle an image of arbitrary size, reducing it to fit if necessary.)

> I encourage browsers to have MIME type -> action be user definable.
> Something like:
> 
>     image/*		xv
>     video/mpeg		mpeg_play
>     audio/basic		audio_play
>     text/html		internal-html
>     text/*		internal-text
>     */*			internal-save-as

Agreed.  I would also encourage use of a metamail-compatible scheme for
this, just so users wouldn't have to configure things separately for WWW and
mail.


Keith Moore