Future of meta-indices: site indexing proposal and Perl script

rst@ai.mit.edu (Robert S. Thau)
Errors-To: listmaster@www0.cern.ch
Date: Thu, 24 Mar 1994 19:33:37 --100
Message-id: <9403241830.AA04327@volterra>
Errors-To: listmaster@www0.cern.ch
Reply-To: rst@ai.mit.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: rst@ai.mit.edu (Robert S. Thau)
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Future of meta-indices: site indexing proposal and Perl script 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 2717
   Date: Thu, 24 Mar 1994 17:38:28 --100
   From: Tim Berners-Lee <timbl@ptpc00.cern.ch>

   This suggestion (on www-talk@info.cern.ch) happens to overlap with
   an SGML suggestion on uri@bunyip.com, in a discussion of URC
   (Universal Resource Citations, aka Metainformation?).
   so I cross-post.

   Another possibility is to use

	   <meta name="summary">
	   MIT AI lab events, including seminars, conferences, and tours
	   </meta>

   which has the advantage that it can be nested:

	   <meta name="author">
	       <meta name="name">Jane Doe</meta>
	       <meta name="email">jd@weird.com</meta>
	       <meta name="urn">/people/1967/us/va/12437234hgj3246h</meta>
	   </meta>

   and is equivalnt to the LISP which was also proposed on
   the uri list.

Unfortunately, over the short term, it also has a disadvantage, in that
documents with this particular form of metainformation coding would
probably be mishandled by plain-jane HTML browsers --- these would ignore
the <meta> and </meta> tags (as they generally ignore any tags which they
aren't specifically prepared for), and present the values of the
metainformation into the document text.  

By contrast, with the <meta name="..." value="..."> scheme which I (and a
few others) have been discussing recently, the browsers don't wind up
displaying the metainformation, since it's *all* buried in tags which they
simply ignore.

(Notice of covert agenda: the reason I'm particularly concerned about this
is that I'm looking for something I can use to drive my autoindexing script
now, meaning that it has to cope well with the existing infrastructure,
including browsers which have never heard of any sort of <meta ...> tag).

Still, if there were a nested structure which the existing browsers would
ignore, I and my indexer could easily live with that.  There's a hint of a
way to get one in the distinction below:

   Perhaps it would be useful to distinguish between two
   semantics:

   1.   A noun clause for the object which has properties
	   urn=sdfgwkedf, height=1237123, fsize=9.5

   2.   A *statement* that the object define by
	   urn=sdfhjsdf
	has properites
	    height=1237123, fsize=9.5


If we use different tags for the two levels, we could have a structure like
this (with apologies in advance for any unintended breach of SGML convention):

	<metaobject name="author">
	    <metastmt name="name" value="Jane Doe">
	    <metastmt name="email" value="jd@weird.com">
	    <metastmt name="urn" value="/people/1967/us/va/12437234hgj3246h">
        </metaobject>

One thing that is lost this way is that you can't put HTML tags in the
metavalues, but it's not clear that's necessarily wise to permit anyway.

Comments?

   timbl

rst