misconceptions about MIME [long]

Dan Connolly <connolly@pixel.convex.com>
Message-id: <9210220109.AA02869@pixel.convex.com>
To: gopher@boombox.micro.umn.edu, wais-talk@quake.think.com,
Cc: nsb@bellcore.com
Subject: misconceptions about MIME [long]
Summary: MIME body != MIME message, MIME is for 8bit systems too
Content-Type: multipart/mixed; boundary="cut"
Date: Wed, 21 Oct 92 20:09:36 CDT
From: Dan Connolly <connolly@pixel.convex.com>

I wrote an article a while back called "MIME for global hypertext." It
was about a world where all the information tools on the internet
interoperated, with MIME as the substrate. Since then, I've got a few
words of support, but none from the important people. It seems that
many people have taken a cursory look at MIME and dismissed it as only
relevant to email.


There are two misconceptions about MIME that show up frequently in
response to my suggestions. The first is the confusion between a MIME
body part and MIME message.

A MIME body part is just a sequence of bytes with an associated
content-type and content-transfer-encoding -- much like a C variable
is just a piece of memory with an associated type. The default MIME
content-type is text/plain, i.e. a sequence of lines of text.

So all the text files out there in the world already fit the role of a
MIME body part. They don't need any headers, encodings, or anything.
In fact, all the non text files qualify as MIME body parts too, if we
associate them with the application/octet-stream content-type and the
binary content-transfer-encoding.

A MIME message is actually a special case of the body part concept.
Along with the text/plain, image/gif, audio/basic,
application/postscript etc. types, there's a message/rfc-822 type.
That's where the headers and stuff come in.

There's also a multipart/mixed type that encapsulates several body
parts into one. It has some syntax that a lot of folks aren't crazy
about, but it's just _one_type_ of body part.


The other area of confusion concerns eight bit data. MIME provides
mechanisms for transport of eight bit data across mail transport
agents that only grok limited character sets and line lengths. These
mechanisms obviously increase the size of messages.

But when used with 8-bit-clean transport mechanisms, the encodings are
not necessary, and MIME adds no cost to data transport. From the RFC:

	As of the publication of this document, there are no
	standardized Internet transports for which it is legitimate
        to include unencoded 8-bit or binary data in mail bodies.
        Thus there are no circumstances in which the "8bit" or "binary"
        Content-Transfer-Encoding is actually legal on the Internet.
        However, in the event that 8-bit or binary mail transport
        becomes a reality in Internet mail, or when this document
        is used in conjunction with any other 8-bit or binary-capable
        transport mechanism, 8-bit or binary messages should be labeled
        as such using this mechanism.

That is, the MIME standard suggests that 8-bit data be labelled as
such, so that gateways between 8-bit and 7-bit systems can recognize
and encode such data.

For example: a WAIS<->email gatway might retrieve a GIF image from a
WAIS server. If the WAIS server stated that the data is type image/gif
and binary encoded, the gateway could base64 encode the data and send
it on its way through SMTP. On the other hand, if the server labelled
the data 7bit text/plain, the gateway could pass it along without


Gopher defines several types of "objects." The basic types are text
files and directories. Other types include various archive formats,
telnet sessions, CSO servers, raw data files, and a few experimental

One of the experimental types is 'M' for MIME messages, that is body
parts of type message/rfc-822. I suppose that much of the opposition
to using the MIME body part to model the Gopher object comes from
folks who think that they'd have to put all their data into
message/rfc-822 format.

But I didn't suggest we use the MIME message/rfc-822 format for all
Gopher objects: just use _some_ MIME format for all gopher objects.
The fact is, their data is already in a MIME format: text/plain. And
the experimental 'g' (GIF) type is already a MIME type: image/gif.

The only Gopher objects that don't fit the MIME system are the ones
that are not data streams at all: telnet sessions, CSO servers, etc. I
don't know how (or if) they fit into the MIME model.

There are some Gopher objects that fit the MIME model but are not part
of the MIME standard (yet). For example, the directory listing that
gopher servers send back could be called application/x-gopher for now,
and eventually application/gopher (when it's spec'd out and registered
with the IANA.)

The gopher protocol can be enhanced one time to support the myriad of
multimedia data formats by including a content-type field in gopher

The Gopher+ protocol has even more in common with MIME. In Gopher+,
each gopher object has a bunch of ancillary information such as the
maintainer, size, format, etc. A Gopher+ object and its ATTRIBUTES
looks an awful lot like a MIME body part and it's encapsulating
message. It seems a waste to define two competing standards for the
same mechanisms.


A WAIS document looks like a MIME body part too: a sequence of bytes
with a type. The canonical WAIS type is TEXT. And like the type 'M'
support in gopher, somebody hacked in support for a MIME type in WAIS.

Again, I suggest that the MIME typing system be used in stead of,
rather than inside, the WAIS typing system. That is, modify the
Document-id Structure Definition so that

:type <string>

is obsoleted by 

:content-type <string> ;; as defined by RFC-1341

The only two supported types are TEXT and SRC, whic translate neatly
to text/plain and application/x-wais-src.


There was a question on www-talk about non-text data in WWW documents.
It seems clear that using MIME body parts to model WWW documents is a
natural step.

The notion of a WWW address should be expanded to what I would call a
reference, which is an address, a content-type, and any identification
information (so that clients can test whether two references point to
the same document.)

The WWW client currently infers the content-type from the address. For
the file: scheme, an .html extension implies HTML format -- other wise
text is assumed. Documents from HTTP servers are HTML, which slams
plain text inside SGML if necessary. Documents from Gopher servers are
either plain text or gopher listings. I don't know what WWW clients do
with WAIS documents that aren't text.


The Internet currently serves as the backbone for a global hypertext.
FTP and email provided a good start, and the gopher, WWW, or WAIS
clients and servers make wide area information browsing simple. These
systems even interoperate, with email servers talking to FTP servers,
WWW clients talking to gopher servers, on and on.

This currently works quite well for text.  But what should WWW clients
do as Gopher and WAIS servers begin to serve up pictures, sounds,
movies, spreadsheet templates, postscript files, etc.? It would be a
shame for each to adopt its own multimedia typing system.

If they all adopt the MIME typing system (and as many other features
from MIME as are appropriate), we can step from global hypertext to
global hypermedia that much easier.


Attached is the text of the MIME RFC. Enjoy.

Content-Description: RFC1341 MIME  (Multipurpose Internet Mail Extensions)
Content-Type: message/external-body;

Content-Type: text/plain