Re: misconceptions about MIME [long]

Larry Masinter <masinter@parc.xerox.com>
To: NED@sigurd.innosoft.com, nsb@thumper.bellcore.com,
        wais-talk@quake.think.com, connolly@pixel.convex.com,
        www-talk@nxoc01.cern.ch, ned@sigurd.innosoft.com
In-reply-to: Ned Freed's message of Thu, 29 Oct 1992 08:53:10 -0800 <01GQIM2YWA8I91VWYH@SIGURD.INNOSOFT.COM>
Subject: Re: misconceptions about MIME [long]
From: Larry Masinter <masinter@parc.xerox.com>
Sender: Larry Masinter <masinter@parc.xerox.com>
Fake-Sender: masinter@parc.xerox.com
Message-id: <92Oct30.155508pst.101795@poplar.parc.xerox.com>
Date: 	Fri, 30 Oct 1992 15:54:56 PST
>> The arguments that in-band designation of document format is better
>> than out-of-band information may apply in the electronic mail
>> scenarios, where there is a single sender, multiple recipients, and
>> the recipient has no control over what the sender might send.

>The argument is identical for most file servers, which have even less control
>over the specifics of what files they offer for retrieval. File servers usually
>rely on contributed material and only rarely have anything resembling precise
>control over the material they offer.

But we are not discussing 'file servers' in general, but something
more specific and presumably over which we have more control: use of
MIME content identifiers to identify content-type in World-Wide-Web
and WAIS servers. Even in the case of file servers, while you might
not have control over the material offered, you do have control over
the description of that material as to which version of a purported
standard format the material might be in, and even, in some cases,
which profile of that standard might apply.

>> If I wish to retrieve the document, say to view it, I might want to
>> choose the available representation that is most appropriate for my
>> purpose. Imagine my dismay to retrieve a 50 megabyte postscript file
>> from an anonymous FTP archive, only to discover that it is in the
>> newly announced Postscript level 4 format, or to try to edit it only
>> to discover that it is in the (upwardly compatible but not parsable by
>> my client) version 44 of Rich Text. In each case, the appropriateness
>> of alternate sources and representations of a document would depend on
>> information that is currently only available in-band.

>Even if this happens (I have strong doubts that it will since documents made
>available for public retrieval tend to converge rapidly to lowest-common
>denominator usage) you have failed to propose an alternative that solves this
>usefully.

Documents made available for public retrieval do not cannot 'tend to
converge rapidly to lowest-common denominator usage', because *old
documents do not go away*! If there is diversity today in the
available formats for RFCs, tech reports and PhD theses, that
diversity can only get worse! It is foolish to think that the
diversity will diminish any time in the near future; certainly the
number of 'conference proceedings on CD-rom' is increasing, as people
want to share Mathematica documents, various forms of hypertext, audio
content and the like.

As for a proposal that 'solves this usefully', I have a fairly mild
proposal that, while it does not solve all of the problems in
interoperability, does reduce the amount of uncertainty:

I propose (once again) that instead of saying 'application/postscript'
it say, at a minimum, 'application/postscript 1985' vs
'application/postscript 1994' or whatever you would like to designate
as a way to uniquely identify which edition of the Postscript
reference manual you are talking about; instead of being identified as
'image/tiff' the files be identified as 'image/tiff 5.0 Class F' vs
'image/tiff 7.0 class QXB'.

> Finally, let me point out that I speak as one of the maintainers of one of the
> largest archive of TeX material available anywhere. This material has been
> available via MIME-compliant mail server (and of course FTP) for over six
> months now. This archive contains hundreds of PostScript documents as well
> as all sorts of other stuff. The problems you seem to think are endemic to
> this sort of services have yet to materialize.

I think you need to take a longer-term and broader perspective than a
six-month experience with a single representation of document.


We've been developing a document archive service that can cope with 20
years of collected electronic documents. We have not only Postscript 1
and 2, but also several versions of Interpress, and Press format, two
versions of DVI, revisable formats of 20 years of editor development
-- several versions of tex, latex, framemaker, microsoft word, tioga,
globalview, viewpoint, bravo, bravox, tedit, troff, interleaf,
wordperfect, etc, and images in multiple variations of RES, AIS, TIFF,
sun raster, pcx, macpaint, ad nauseum.

In trying to deal with a documents over the longer term, it has become
apparent that merely marking documents with a simple 'format' tag like
'interpress' or 'postscript' or 'tiff' isn't adequate for most
purposes. Standards evolve over as short as a 5 year period; even the
method of internal tagging standard versions changes, and certainly,
it is impossible to rely on in-band version information for all
formats. 

I have more to say about the problem of 'external references' but I'll
save that for another message.

It would be nice to have a calm discussion about possible solutions to
these problems & hope you will forgo future sarcasm.

Thanks,

Larry