binary file access via Mosaic

marca@ncsa.uiuc.edu (Marc Andreessen)
Date: Sat, 13 Mar 93 01:14:01 -0800
From: marca@ncsa.uiuc.edu (Marc Andreessen)
Message-id: <9303130914.AA15020@wintermute.ncsa.uiuc.edu>
To: www-talk@nxoc01.cern.ch
Subject: binary file access via Mosaic
X-Md4-Signature: 2c3a6314939fd5cff58d08974b594917
A frequent comment of Mosaic users is that binary files of a type that
Mosaic doesn't recognize (i.e., with a filename extension that Mosaic
doesn't recognize) aren't just saved to disk rather than uselessly
displayed as text.  The following document (online as
http://hoohoo.ncsa.uiuc.edu:80/mosaic-docs/file-typing-issues.html)
details the issues involved and explains the solution developed for
the uncoming 0.10 release; I'd appreciate any comments or feedback.

Cheers,
Marc

--
Marc Andreessen
Software Development Group
National Center for Supercomputing Applications
marca@ncsa.uiuc.edu


NCSA Mosaic File Typing Issues
******************************

Motivation
==========

Quite independent from their sources, files have types. A given file can
be plaintext, HTML, GIF, JPEG, AIFF, MPEG, PostScript, you name it.
(MIME provides a way to type data elements within a file, but the file
itself still has a type: MIME.)

In an ideal world, the type of each file would be well-defined metadata
always accessible to an information browsing client prior to the act of
accessing the file. This is true on the Macintosh, but not on most other
systems, and certainly not on the Unix-dominated Internet. Bummer.

Therefore, in an imperfect world, Mosaic uses the common (but not
mandated and not standardized) convention of examining a file's
extension to attempt to determine its type. PostScript files are assumed
to be suffixed '.ps', GIF files '.gif', etc. In this way, Mosaic can
correctly determine file types for the majority of the data files available
on the Internet.

When a file type cannot be thus derived, Mosaic makes a guess. Files
coming over a HTTP server are assumed to be HTML; files coming
from any other source (except Gopher; see below) are assumed to be
plaintext.

This, of course, causes a problem. What happens when a file is assumed
to be viewable text, but it really isn't? Well, Mosaic attempts to display
it as text. Boom. Serious badness.

Solution
========

In Mosaic version 0.10 (and later), there is a solution. The user is
allowed to select, on the fly, whether untyped files (i.e., files with no
recognizable suffix) are to be assumed to be viewable (text) or not
viewable (data). If the former, such files will be displayed; if the latter,
such files will be dumped to a local disk and the user will be notified
appropriately.

When files of unrecognized types are automatically dumped to disk as
binary data in this manner, Mosaic is said to be in "binary transfer
mode".

A toggle button in the Options menu allows you to turn binary transfer
mode on and off, on the fly, at the per-window level. It is implicitly
assumed that binary transfer mode will generally be off, since many
common documents (for example, the results of a WAIS query, with a
URL something like '.../my-database?query') need to be assumed to be
text for the usual Mosaic interfaces to function. However, when you're
in a situation, e.g. while browsing an anonymous FTP site, where you
know you want to pull over 'file.xyz' and you know that file 'file.xyz' is
really binary data and shouldn't be displayed as text, then you turn on
binary transfer mode, access the file, grab the file from the local disk
(where it is dumped automatically by Mosaic), and switch back out of
binary transfer mode.

Note that binary transfer mode does not impact other Mosaic
functionality; notably:

 o Automatic and transparent uncompression of compressed (.Z)
   and gzip'd (.z) files still occurs with untyped files.
 o Binary (data) files with recognized extensions (e.g., '.gif' and
   '.jpeg') are still passed off to the appropriate external viewers.
   (Note that a new feature in 0.10 is that if the name of an data
   type's external viewer is defined via the X resource mechanism
   to be "dump", then data files in that format will be dumped to
   disk as if they were untyped files being retrieved in binary
   transfer mode.)

Notes
=====

Note that it is practically impossible to heuristically determine whether
an untyped file is viewable or not viewable as text for the simple reason
that 8-bit text is now commonplace.

Gopher has its own typing system (and not a very good one, at that). As
such, the rules don't apply. Mosaic tries to do the right thing with
various types of Gopher files, but using a different scheme, based on the
Gopher-defined file types.

marca@ncsa.uiuc.edu