Re: filetype extensions

"Daniel W. Connolly" <connolly@hal.com>
Errors-To: listmaster@www0.cern.ch
Date: Mon, 9 May 1994 23:15:27 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <9405092111.AA29774@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: filetype extensions 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: text/plain; charset="us-ascii"
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0
Mime-Version: 1.0
In message <8hndfDG00WC7A0yyN=@andrew.cmu.edu>, Rob Earhart writes:
>
>  I wrote a www server from scratch at the same time that I began
>supporting Mosaic as a contributed application at Andrew, for the
>experience of writing the server, gaining a full knowledge of the
>protocol, and because I was bored :-)
>

Isn't it interesting that each time a new implementor comes along,
s/he has to trip over all the hacks, kludges, and general differences
between the specs and the existing implementations and practices...
Perhaps one day the specs will be caught up...

>  Embracing the http/1.0 concept of multiple content types for the same
>document, the server takes the Accept: list from the client, turns it
>into a list of extensions, and attempts to access each path.extension in
>turn.

Good idea, but I'd suggest a slight twist: I don't think it's wise
to assume that there is a well-defined mapping:
	ext : ContentType -> String
so I wouldn't encourage the approach of working from content types
to extensions. The technique I like (found it in WWWLibrary) is to
keep a table of:
		ContentType, extension String, confidence Float

Then, start with the given path; find all (type, ext, conf) such that path.ext
exists in the filesystem, and maximize conf where type is in the
client's Accept: list. (Actually, you're supposed to take into account
cost-per-byte to transmit and translation quality in the metric function...
details are in the HTTP spec...)

>  The problem:  I've run into substantial resistance to the idea from
>the user community.  They want to add hyperlinks to "foo.gif", not
>"foo".

So they don't get multi-format magic. Their loss. See below about symlinks...

>  I'm getting two arguments for the use of extensions in the URL's:
>People want to be able to use 'file:' and relative links to view their
>files without going through the server (and maybe get the server to
>translate pages on the fly when requested from AFS sites into 'file:'
>links, reducing HTTP server load),

ACK! Thou Shalt Not Promote The Use of The Unclean 'file:' Scheme!
Surely it will lead you down The Path to Confusion and Dispair!
The Holy Access Types Are, As Set Forth in RFC 1521 (MIME):
	local-file:  (obsoletes file:)
	afs:
	anon-ftp:
	ftp:
	mail-server: (obsoletes mailto:)
plus the Other Happily Well-Defined URL Schemes:
	http:
	gopher:
and the Somewhat Unstable But Forthcoming:
	wais:
	news:
and the Hoplessly Wierd But Useful:
	telnet:
	tn3270:

if you want to write an HREF that means "get file X the same way
you got this one," you can just leave the access type implicit; e.g.:

	HREF="/pub/stuff/file.txt"


> and they want to maintain
>compatibility with the other AFS www server on campus (run by our School
>of Computer Science), which handles extensions the "normal" way.

 Is it too much to
ask them to use HREF="foo" and make a link/symlink from foo to foo.gif
for local-file access and to support other servers?

>  I've also had a request to try to resolve document types only when the
>client doesn't send an extension on the request; the problems here are
>that the extension in the URL is still significant (which seems a bit
>backwords), and that eventually I'd like to implement a mechanism to
>allow the user to use the extensionless file to specify versions of
>documents in different languages, character sets, and encodings.

Hmmm... yess... "the extension... is still significant." If a client
specifically wants the gif version of foo, I'd rather see it send:

	GET foo HTTP/1.0
	Accept: image/gif

than

	GET foo.gif HTTP/1.0
	Accept: */*

The latter form will probably work for now... but what about the future
when there may be caching proxy servers with built-in graphics conversion?
Such a proxy may have image/tiff, and it may be able to generate image/gif
faster than going round-trip to the original server. But extensions are
an out-of-band technique: a proxy server can't "peek" at the extensions
the way it can look at the Accept: header.

We must be very careful about time-to-live, conversion quality, etc.
to be sure that the proxy servers don't compromise the protocol.

There's some stuff in the HTTP protocol spec about a URI: and Vary:
header in the server's response to address this. Basically, a server
is supposed to tell the client how long it can cache the document,
and whether there are variations on the document.

Oh... we also need a way to express

	GET foo HTTP/1.0
	Accept: image/gif

in an HREF (or in HTML somewhere)... because sometimes you want to
refer to a specific version/format of a document. I've suggested
	<A HREF="foo" Content-Type="image/gif">...</a>
and
	<A HREF="foo;content-type=image/gif">...</a>
in the past without much luck. The second form, which puts the data
in the URL, has some chance of being deployed... stay tuned...

>  So... what do people think?  Pragmatism or Purism?  Should I bow down
>to the pressure to stop having it try to add extensions?

Enough Purism to save yourself from having to re-engineer your solution
down the road, mixed in the enough Pragamatism to make it useful to
your user community today.

Dan