Re: misconceptions about MIME [long]

Ned Freed <NED@sigurd.innosoft.com>

Mail folder: WWW Talk 1992 Archives
Next message: Edward Vielmetti: "Re: question and answer, style guide? "
Previous message: Robert Raisch: "Re: question and answer, style guide?"

Date: 02 Nov 1992 12:37:29 -0700 (PDT)
From: Ned Freed <NED@sigurd.innosoft.com>
Subject: Re: misconceptions about MIME [long]
To: masinter@parc.xerox.com
Cc: NED@sigurd.innosoft.com, nsb@thumper.bellcore.com,
        wais-talk@quake.think.com, connolly@pixel.convex.com,
        www-talk@nxoc01.cern.ch
Message-id: <01GQOCYX092Q91W1GH@SIGURD.INNOSOFT.COM>
X-Vms-To: IN%"masinter@parc.xerox.com"
X-Vms-Cc: 
 IN%"NED@SIGURD.INNOSOFT.COM, nsb@thumper.bellcore.com,wais-talk@quake.think.com,
 connolly@pixel.convex.com, www-talk@nxoc01.cern.ch"
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Content-Transfer-Encoding: 7BIT

> But we are not discussing 'file servers' in general, but something
> more specific and presumably over which we have more control: use of
> MIME content identifiers to identify content-type in World-Wide-Web
> and WAIS servers. Even in the case of file servers, while you might
> not have control over the material offered, you do have control over
> the description of that material as to which version of a purported
> standard format the material might be in, and even, in some cases,
> which profile of that standard might apply.

I remain extremely skeptical that you can exercise this level of control. I
suspect that if you achieve any control whatsoever it will probably be arranged
along axes that you haven't yet considered. (The most obvious one would be
knowing package that produced the document as well as the version of its
PostScript driver.) I think that if you try to arrange a particular sort of
labelling now you will only find that things don't match up to it.

If in fact you do end up with some degree of control over version-specific
usage it will _then_ be the time to add some additional parameters containing
this information.

>> If I wish to retrieve the document, say to view it, I might want to
>> choose the available representation that is most appropriate for my
>> purpose. Imagine my dismay to retrieve a 50 megabyte postscript file
>> from an anonymous FTP archive, only to discover that it is in the
>> newly announced Postscript level 4 format, or to try to edit it only
>> to discover that it is in the (upwardly compatible but not parsable by
>> my client) version 44 of Rich Text. In each case, the appropriateness
>> of alternate sources and representations of a document would depend on
>> information that is currently only available in-band.

>Even if this happens (I have strong doubts that it will since documents made
>available for public retrieval tend to converge rapidly to lowest-common
>denominator usage) you have failed to propose an alternative that solves this
>usefully.

> Documents made available for public retrieval do not cannot 'tend to
> converge rapidly to lowest-common denominator usage', because *old
> documents do not go away*!

Nonsense. Documents (old or new) get revised when they fail to meet the needs
of lowest-common denominator usage. Let's suppose I put up a document that uses
a bunch of level 2 PostScript extensions. Let's further suppose that I emblazon
it with dozens of labels indicating that this is the case. But all the
labelling in the world will not make this document print on an old LaserWriter.

Regardless of labelling people will want to print the document. Maybe they will
retrieve it and maybe they will not. I seriously doubt that most people will
know what all this version stuff means (try asking the average user of a laser
printer what version of PostScript is supports), so in most cases the labelling
will be wasted effort. But regardless of whether they retrieve the document and
it fails to print or they heed the labels, people will then complain.

As the author of the document I will be "encouraged" to provide a version that
will work properly on the huge installed base of level 1 (or lower) printers. I
then have three choices: (1) Build two documents, one at each level, (2) Build
one document at the lower level, or (3) Build one document that works at either
level. (1) is not useful since it means maintaining two copies of a document in
almost the same format. (2) is totally reasonable, but the PostScript standards
strongly recommend (3) and give numerous examples of how to make (3) happen.

I would therefore claim that the world will rapidly move to using (3), and
makes most sorts of labelling entirely pointless.

> If there is diversity today in the
> available formats for RFCs, tech reports and PhD theses, that
> diversity can only get worse! It is foolish to think that the
> diversity will diminish any time in the near future; certainly the
> number of 'conference proceedings on CD-rom' is increasing, as people
> want to share Mathematica documents, various forms of hypertext, audio
> content and the like.

This supposed diversity is almost entirely illusory. In the early days of
PostScript this was perhaps true, but things have now been worked out to a
point where there's a large consensus behind what is acceptable PostScript
usage and what is not.

In addition, the norms of acceptable PostScript usage are actually somewhat
more restrictive than what appears in the specifications. This is largely due
to the bugs that were present in early PostScript interpreters (problems with
character set vector information come to mind here). This has resulted in a
peculiar mix of feature use and non-use. Fortunately, there are relatively few
glitches of this sort, and this has led to a situation where the problems are
fairly well understood by the driver-writers in the PostScript community.
Nevertheless, I would be the first to support an RFC that describes in some
detail what sorts of PostScript usage are acceptable and what are not.
(I'm more than willing to contribute to such a document; I just don't have
time to write the whole thing myself.)

As part of my daily work I help maintain a portion of our product line that
includes a full PostScript interpreter. As a result I deal with PostScript
problems on a daily basis. And while there are still a couple of notorious
offenders out there, most modern PostScript problems end up either being file
corruption issues or missing header files -- things of this sort. In other
words, the MIME encodings go a long way towards solving most real-world
problems printing PostScript, and driver changes can solve most of the
remaining problems.

> As for a proposal that 'solves this usefully', I have a fairly mild
> proposal that, while it does not solve all of the problems in
> interoperability, does reduce the amount of uncertainty:

> I propose (once again) that instead of saying 'application/postscript'
> it say, at a minimum, 'application/postscript 1985' vs
> 'application/postscript 1994' or whatever you would like to designate
> as a way to uniquely identify which edition of the Postscript
> reference manual you are talking about; instead of being identified as
> 'image/tiff' the files be identified as 'image/tiff 5.0 Class F' vs
> 'image/tiff 7.0 class QXB'.

All my objections still apply (you have yet to respond to a single one of my
earlier points on this matter). I remain totally and completely convinced that
this just causes numerous problems and solves absolutely nothing.

I will spare everyone the repitition of the numerous problems it causes -- they
can get all that out of my earlier mail.

> > Finally, let me point out that I speak as one of the maintainers of one of the
> > largest archive of TeX material available anywhere. This material has been
> > available via MIME-compliant mail server (and of course FTP) for over six
> > months now. This archive contains hundreds of PostScript documents as well
> > as all sorts of other stuff. The problems you seem to think are endemic to
> > this sort of services have yet to materialize.

> I think you need to take a longer-term and broader perspective than a
> six-month experience with a single representation of document.

The archive has been online for about five years. (I just checked and there are
three FTP connections going right now -- fairly light usage for midday.) Mail
service has been available for three years or so. MIME compliance in the mail
service was introduced six months ago. Some of my contribution to the
development of MIME came from the experiences of setting up and maintaining
this archive.

> We've been developing a document archive service that can cope with 20
> years of collected electronic documents. We have not only Postscript 1
> and 2, but also several versions of Interpress, and Press format, two
> versions of DVI, revisable formats of 20 years of editor development
> -- several versions of tex, latex, framemaker, microsoft word, tioga,
> globalview, viewpoint, bravo, bravox, tedit, troff, interleaf,
> wordperfect, etc, and images in multiple variations of RES, AIS, TIFF,
> sun raster, pcx, macpaint, ad nauseum.

Have you deployed it yet? Do you have any operational experience at all?

> In trying to deal with a documents over the longer term, it has become
> apparent that merely marking documents with a simple 'format' tag like
> 'interpress' or 'postscript' or 'tiff' isn't adequate for most
> purposes. Standards evolve over as short as a 5 year period; even the
> method of internal tagging standard versions changes, and certainly,
> it is impossible to rely on in-band version information for all
> formats. 

As I have stated on numerous occasions, my arguments and comments on this
matter are intended _only_ to apply to PostScript. I am fully aware that many
formats do not include version information and this must be provided
externally. There are also formats which provide internal versioning
information but that information is known to be inconsistent. I remain totally
in favor of providing this information externally as needed. But as usual you
persist in trying to apply my comments in a larger context where we both know
they are not valid.

PostScript is a special case -- a very very special case. It is a complete
programming language for starters. This makes it impossible to do anything
resembling a complete analysis of feature utilization. PostScript also is
fairly unique in that it has a comprehensive and extensible internal mechanism
for describing itself.

> I have more to say about the problem of 'external references' but I'll
> save that for another message.

Fine.

> It would be nice to have a calm discussion about possible solutions to
> these problems & hope you will forgo future sarcasm.

I'm going against my better judgement and replying to you despite the fact that
you still have not responded with how you propose to solve any of the issues I
have raised in earlier messages. Rest assured, however, that this is my last
word on the subject until you begin to come to grips with the serious problems
your proposal will cause us all.

				Ned