Future of meta-indices: site indexing proposal and Perl script

rst@ai.mit.edu (Robert S. Thau)
Errors-To: listmaster@www0.cern.ch
Date: Tue, 22 Mar 1994 17:41:37 --100
Message-id: <9403221638.AA02557@volterra>
Errors-To: listmaster@www0.cern.ch
Reply-To: rst@ai.mit.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: rst@ai.mit.edu (Robert S. Thau)
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Future of meta-indices: site indexing proposal and Perl script 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 4034
   Date: Tue, 22 Mar 1994 11:29:11 --100
   From: "Roy T. Fielding" <fielding@simplon.ics.uci.edu>

One open question here (on which there is perhaps some disagreement between
your note and Dave Raggett's earlier one) is the intended purpose of the
<meta ...> tag.  If it's a general-purpose hook for all sorts of
metainformation (some of which may not be appropriate to be sent out as
headers for every GET request), one set of tradeoffs are appropriate; if
it's simply a hook to get things into the headers, then another.  (As I
said, I'm aware that the whole <META ...> business was originally proposed
for another purpose, and to that extent, at least, I'm poaching).  However,
I do have a few comments either way:

   How about:

       <meta name="Summary"
       value="MIT AI lab events, including seminars, conferences, and tours">
       <meta name="Keywords"
       value="MIT, Artificial Intelligence, seminar, conference">

   Also, don't forget that the purpose of META is so that a server capable
   (and willing) to parse metainfo can then send the headers

       Summary: MIT AI lab events, including seminars, conferences, and tours
       Keywords: MIT, Artificial Intelligence, seminar, conference

   as part of the HTTP response object headers.  Thus, use of the META
   element should be limited to things for which headers are desirable.

I'm not completely sure this is a good idea, for this information.  The
"IAFA-description"s may run on for some length --- for instance, the entire
abstract of a technical paper.  (FWIW, the IAFA-Publishing Internet Draft
says that the Description entry on templates should be 'the "abstract" in
the case of documents'; also, technical papers are starting to appear on
the Web as hypertext --- see, for instance, AI Technical Report 1315 at
http://www.ai.mit.edu/people/ellens/why.html, or the Transit project docs
at http://www.ai.mit.edu/projects/transit/transit_home_page.html, to cite
two examples close to home).

Abstracts are clearly metainformation in the general sense, but they seem,
to my taste at least, a bit heavy-duty for a HEAD request.  (Do we really
want large HEADs to be routinely larger than small documents?)

As to the more general point of how these META-things are named, I could
certainly use general "Summary" and "Keywords" fields, if their definition
fits my (indexing) application, but I feel it's important to be sure how
they're defined before taking such a high-visibility chunk out of the
global "meta-thing" namespace, and I wasn't sure of that this weekend.  ;-)

   > (There's one other kind of meta-information my indexer uses --- if it sees
   > <meta name="iafa-type" value="service">, it indexes the page in question
   > with a SERVICE template, as opposed to a DOCUMENT template.  This is useful
   > for cover pages of search engines and the like).

   Now that is something which is not of general usefulness.

Well, the SERVICE/DOCUMENT distinction does come from the IAFA templates,
which were intended to be useful off-site.  Perhaps there's room for
improvement there as well, but it does seem to me to be a "useful"
distinction.  (Searching for documents about Archie or WAIS is different
from searching for a gateway, for example).

Whether it's appropriate to be shipped out as an extra HTTP header with
every GET or HEAD request, that's another issue altogether.

   Most examples of appropriate metainfo names can already be found in
   NNTP (rfc1036) and rfc822.  However, you are probably right in that we
   should have some sort of specification for what the names mean.

That is certainly so *if* the universe of meta-tagged metainformation is
limited to items appropriate for HTTP/MIME headers, which wasn't entirely
clear from prior discussion.  I guess what I'm trying to do is canvas the
community on the issue.

   ...Roy Fielding   ICS Grad Student, University of California, Irvine  USA
		      (fielding@ics.uci.edu)
       <A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>

rst