Re: meta information

"Roy T. Fielding" <fielding@simplon.ICS.UCI.EDU>

Mail folder: WWW Talk Apr 94-present
Next message: Daniel W. Connolly: "Re: meta information "
Previous message: Daniel W. Connolly: "Re: meta information "
Maybe in reply to: Daniel W. Connolly: "Re: meta information "
Reply: Daniel W. Connolly: "Re: meta information "

Errors-To: listmaster@www0.cern.ch
Date: Thu, 2 Jun 1994 00:13:24 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <9406011511.aa29004@paris.ics.uci.edu>
Errors-To: listmaster@www0.cern.ch
Reply-To: fielding@simplon.ICS.UCI.EDU
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Roy T. Fielding" <fielding@simplon.ICS.UCI.EDU>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: meta information 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas

> Dave Raggett, Tim Berners-Lee and I discussed the META element
> for a while. I don't like the idea. Tim didn't either, and
> Dave Raggett eventually agreed that there might be better
> ways to express these sentiments.

*ARGH*  The need for this element should be obvious from prior www-talk
discussions and from my paper at WWW94, but here we go again:

----------------------------------------------------------------------
<!--
 The META element can be used to embed document metainformation not
 defined by other HTML+ elements for use by servers/clients capable
 of extracting that information.

 Servers should read the document head to generate HTTP headers
 corresponding to any META elements with the HEADER attribute,
 e.g. if the document contains:

     <meta header name="Expires" value="Tue, 04 Dec 1993 21:29:02 GMT">

 The server should include the header:

     Expires: Tue, 04 Dec 1993 21:29:02 GMT

 as part of the HTTP response to a GET or HEAD request for that document.
 When the HEADER attribute is not present, the server should not generate
 an HTTP header for this metainformation; e.g.

     <meta name="IndexType" value="Service">

 would not generate an HTTP header but would still allow clients or
 other tools to make use of that metainformation.

 Other likely names are "Keywords", "Created", "Owner" (a name)
 and "Reply-To" (an email address).  
-->

<!ELEMENT META - O EMPTY>
<!ATTLIST META
        id      ID      #IMPLIED -- to allow meta info                  --
        header (header) #IMPLIED -- generate HTTP header                --
        name    CDATA   #IMPLIED -- metainformation name e.g. "Expires" --
        value   CDATA   #IMPLIED -- associated value                    -->
----------------------------------------------------------------------

> 
> What is the meaning of the META element? I've heard several
> things:
> 
> Proposal: It's for http headers:
> 	<META name="Expires" value="Tue Aug 12, 1994 10:33:32 CST">
> Answer: Then why not write:
> 	<HTTP-HEADER name="Expires" ...>

Because metainformation may or may not also be useful as header information,
depending on the capabilities of a given server and the existence of
future tools which make use of that information.  Nevertheless, it is still
metainformation whether or not it is used within response headers.

> and what happens when some dork writes:
> 	<HTTP-HEADER name="User-Agent" value="bogus">

Not a damn thing, as User-Agent is not a valid RESPONSE header.  Furthermore,
how the server decides what is and what is not a legal name for a header 
is left to the server implementation -- the META element only provides a means
for document authors to include that information within the document head.
In other words, it could easily ignore any META headers named "Date" or
"Last-modified".

> I can see the need for:
> 
> 	<EXPIRES DATE="...">
> 
> but not a general HTTP header escape mechanism.

But can you anticipate the needs of everyone?  My original proposal called
for an EXPIRES element like the above and an OWNER element like

        <OWNER name="...">

It was shot down because it does not satisfy the general need for document
metainformation which can be parsed without pre-knowledge of the purpose of
that metainformation.  To take an example from an earlier discussion:

        <META name="IAFA-Template" value="document">

would be useful for automated tools that build IAFA indices.

> Proposal II: It's for private indexing techniques. Then why not
> 	use comments or processing instructions?
> 	<?keywords a,b,c,d>
> 	<?description lksjdflkjsdf>
> or
> 	<!-- @#@# KEYWORDS: a,b,c -->
> 	<!-- @#@# DESCRIPTION: ... -->

Because it is not for PRIVATE indexing techniques.  There is a multitude
of uses for this information, most of which I did not think of when the
META element was originally proposed.

> As long as these techniques are private to one implementation, this
> is all you need. If you get to the point where you expect other
> folks to understand the meaning of these idioms, just propose
> new tags:
> 	<KEYWORDS>a,b,c,</keywords>
> 	<description>lkjsdlfkjsldf... </description>

Love to, but HTML lacks any ability to add new elements which
contain content without breaking compatibility with existing browsers. 
Thus, including the above in any existing document will result in the line 

a,b,c,lkjsdlfkjsldf... 

appearing at the top of the display.

> And about indexing... anybody who's interested in these problems should
> probably try to follow the discussions about URC's on the
> uri-request@bunyip.com mailing list.

Is that a person or a mail robot?  It did not respond to my request yesterday.
Is that discussion archived on the web?

> Also, at WWW '94, it was suggested that the TEI header might be a workable
> technology to use in this area.

It was?  Could you elaborate?  WWW94 would have been a lot more useful
if the workshops did not compete with the speakers.  For example, I believe
all of you (i.e. Tim, Dave, and Dan) were off talking about development
futures during the presentations about indexing (Thursday morning).
Had you been more involved in the general conference, I think the panel
on future developments would have been broader in scope.

> If you're gonna hack, go ahead and hack. If you're trying to get it
> right, make sure you've done your homework. But don't hack and pretend
> you're getting it right :-)

The META element is not a hack.  It was proposed 6 months ago as a valid
addition to HTML+ (now HTML 3.0).  The "header" attribute was added two
months ago when it became clear that not all metainfo is desirable as
headers.  It was designed to provide a useful function within the limits
imposed by the hack we call HTML (which, by any measure, is a damn good hack).

Even if we were to completely re-engineer HTML 3.0 so that it could contain
arbitrary SGML and be able to define new elements on the fly (which can be
correctly parsed by all HTML 3.0 clients rather than simply ignored), there
would still be a need for general META elements so that tools (e.g. MOMspider)
could find that information without having to hard-code its name within the
tool itself.

...Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                   (fielding@ics.uci.edu)
    <A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>