Re: Charset labelling (Was: Comments on: "Character Set" Considered Harmful)

Robert S. Thau (rst@ai.mit.edu)
Sat, 29 Apr 95 11:11:13 EDT

Date: Fri, 28 Apr 95 22:58:31 EDT
Reply-To: erik@netscape.com
Precedence: bulk
From: erik@netscape.com (Erik van der Poel)
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group

>How about defining a new MIME type for filesystem-based documents (or
>perhaps just storing such documents with the MIME headers)? Why is it
>absolutely vital to store the documents as HTML when this is obviously
>insufficient?
>
>One could assume that documents which use the .htm(l) extension use
>ISO-8859-1 (default, same as HTTP), and documents with a .www (or
>whatever) extension have HTTP headers.

A couple of days ago, Bob and I discussed something very similar (if not
exactly the same), only we called it *.mim (for MIME).

Just a brief note --- the Apache server, currently in beta, already
implements something like this, under the guise of the *.asis feature
--- see

http://www.apache.org/apache/docs/E63

(or perhaps http://www.hyperreal.com/apache/docs/E63 until our name
service troubles are sorted out).

Incidentally, the CERN server also has a feature which allows
arbitrary MIME headers to be tossed onto the response to an ordinary
file request, but it requires the extra headers to be stored in a
separate file (I believe foo.html.meta stores the extra headers for
foo.html), which requires an extra stat() call to check for the
existence of the *.meta file --- this can be awkward for those running
distributed file systems (DFS, AFS) on which stat() of nonexistent
files can be expensive.

(The *.asis stuff was Rob Hartill's baby, so I'm cc:ing him here, even
though I'm not sure whether he's ordinarily on HTML-WG... Rob, this
originally came up as part of the ongoing discussion of how to label
the character set of an HTML document in a multinational world-wide
web --- you get into a chicken-and-egg problem if you try to do it in
the document itself (you can't parse the document until you know the
character set), so storing it in external headers someplace may be the
best approach).

rst