Re: RFC: Multi-Owner Maintenance robot (MOMspider)

"Roy T. Fielding" <fielding@simplon.ics.uci.edu>

Mail folder: WWW Talk Oct 93-present
Next message: Hitoaki Sakamoto: "Re: International Document Server Support "
Previous message: Dave_Raggett: "Re: International Document Server Support"
In-reply-to: Timothy Berners-Lee: "Re: RFC: Multi-Owner Maintenance robot (MOMspider)"

To: Timothy Berners-Lee <timbl@dxcern.cern.ch>
Cc: www-talk@nxoc01.cern.ch
Subject: Re: RFC: Multi-Owner Maintenance robot (MOMspider) 
In-reply-to: Your message of "Mon, 06 Dec 1993 20:48:03 +0100."
             <Pine.3.07.9312062002.D19717-d100000@dxcern.cern.ch> 
Date: Wed, 08 Dec 1993 04:12:45 -0800
From: "Roy T. Fielding" <fielding@simplon.ics.uci.edu>
Message-id: <9312080412.aa02028@paris.ics.uci.edu>

Sorry for the delay in responding -- I seem to have overloaded
(or overdosed?) on information....

> On Mon, 6 Dec 1993, Roy T. Fielding wrote:
>> Each maintained HTML file would include an HTML comment of the following
>> format as the first line of the file:
>> 
>> <!-- MOM Owner="AnyAuthorAlias" Expires="31 Dec 1993" -->
>> 
> ...
>> Note that since this added line is a comment, it will have no effect on
>> existing servers or clients.
> 
> I agree with your later classification of this as a kludge, and
> would much prefer to use either the existing LINK element
> or new elements. Note that the HTML spec requires browsers to
> ignore elements they do not understand -- that is, to treat
> them as though the tags were not there. As this would leave
> the contents within the HEAD of a message, it seems reasonable
> to interpret this to mean that the contents (which would not
> otherwise be allowed in the HEAD part) should be ignored.

Unfortunately, Mosaic 2.0 for X (the only client I have tested this on)
just ignores the elements and displays the content as normal text.

> This gives you a transition path from trying it out on a small
> scale to standardising on it -- without breaking anything on the way.

That's definitely a good idea -- the initial suggestion of the comment was
just to make it easy for me to hack my local server.  I think I'll go with
Dave Raggett's last suggestion instead.

> ...
> I would like to draw your attention to the comments in the HTTP spec
> about WWW_Link: and the isomorphism between HTTP object header feilds
> and HTML head elements.  There it is suggested that a formal
> relationship be made, automatically defining one set in terms of
> the other.

I've read that stuff several times, and just checked it again to be sure.
Has any decision been reached (by anyone) as to whether the "WWW-" prefix
should be used?  Has this been implemented in CERN httpd? (I know it hasn't
in NCSA httpd).

Personally, I think the prefix should not be used for headers which are
applicable beyond HTML (as is the case for Expires and Owner).

> The next point is that, although the two specs have the
> metainformation in common, they should be kept separate.
> This separation should include the MOMspider design.
> Remember that GIFs have owners too -- and expiry dates, etc.
> 
> Suppose we specify this metainformation in HTTP. I think that
> it is really useful, and will put it in unless anyone objects.
> Owner, that is, as Expires: is already in. You have to leave it
> up to a server owner as to how he generates that field.
> Nowhere does the HTTP spec say anything about how the feilds
> are generated, only what they mean. For example, one could
> take the uid or gid fields as a good guide. It is rather system
> dependent.
> 
> So as a separate issue, we can add the fields for HTML, which
> would probably be used by most servers (server admins) to generate
> the HTTP headers.

Yes, that's exactly what I am proposing.  Thanks.

>> ...
>> Similarly, the HTTP response would include that metainformation as
>> appropriate headers (note that this has already been suggested for
>> the Expires header but I haven't seen any mention of how the expire
>> date would be obtained from normal HTML files).
> 
> Again, it could be done in bulk, for example by specifying that
> anything in /internet-drafts expires 6 months after its creation date.

It can?  I have not seen that anywhere outside netnews.  I would then 
suggest that the server (or whatever controlled that) should have an
option for which date (the one in the file or the one for the directory)
takes precedence.  But, as you said, that is server-specific and not HTTP.

> A question: Suppose we have this info duplicated in the HTTP
> headers and in HTML. What happens when a client PUTs a document
> with conflicting information? Suppose the server stored all the
> metainformation in a database.  Why ask the server to raed HTML
> files all the way through, when for anything else (GIF, sound) etc
> the server can just soak up the HTTP headers and treat the HTTP
> body as opaque data?   Sounds to me as though the client
> could be the one responsible for copying the metadata into the
> HTML HEAD.  The HTTP metatdata (however it is stored) would be
> the more fundamental.

That's an interesting suggestion, but, it seems to me, it would be 
difficult to maintain consistency between the metainformation and
the body contents if they are stored separately.

As far as clients like MOMspider are concerned, all they need is for
the metainformation to be placed in the headers (it doesn't matter
how they get there) and some means by which authors (or admins) can
specify the contents of that metainformation for a particular object.

>> One point which I think may spark discussion is whether we should
>> specify the Owner as a LINK relationship rather than as its own
>> element.  I decided not to do so for reasons of efficiency and
>> understandability. 
> 
> I'll play devil's advocate here.

That's what I was hoping for  ;).

>> If the owner was specified as a LINK, MOMspider
>> (and any similar clients) would have to parse through all the fields
>> of every LINK header in order to find an owner relationship.
> 
> Hey, come on, it has to read all the header lines anyway to
> look for OWNER. No more sweat for a machine to look for
> LINK REL="OWNS"

Marginal sweat -- the Owner: would be at the beginning of a line
(like any good header), whereas the REL="OWNS" (or REV="OWNS") could
be anywhere in any one of the Link: header lines.  No big deal, I guess.

>> Furthermore, the document author would have to build a contrived
>> reverse LINK relationship with fields normally used for document
>> references
> 
> Not at all -- the LINK element is not normally used for document
> references (the A element is usd for that normally).  The LINK
> element was designed to define any two-ended relationship,
> or binary predicate.  It was designed for relationships like
> 
> 	Jack loves Jill
> 	Jill loves Jack
> 	Jack likes pie
> 	Jill makes pie
> 	Jack eats pie
> 	Jack adicted to pie
> 	Jack needs pie
> 	Jack needs Jill
> 	Jack demands pie
> 	Jill fears Jack
> 	Jack fears "Jack needs Jill"
> 	Jack hates Jill
> 
> Check out all the stuff on semantics in hypertext, like
> the hairspray (keeps your ideas in places but I can't remember
> which brand) from Halasz &co at PARC.

I was talking syntax, not semantics.  The Link field names are the same
as those used by anchors and thus evoke the same psychological reaction
when a reader tries to interpret the semantics.  Furthermore, I have yet
to see an instance where links like the above would be useful in an
information resource database.  ;-)

>>  -- a concept which is counter to understandability and
>> everything I know about software engineering.
> 
> From that point of vire, the useful thing about overloading LINK
> is that a MOMspider (or anyone else) knows that a LINK has 
> a parameter which is an object URI, and so can do quite a lot
> with general machinery for all links. We can have general
> routines like "find me all B such that A o B" rather than
> special routnes "find me all B such that A owns B".

That's exactly what I would want to avoid.  There are some times when
"gee, wouldn't it be neat if we could do recursive indirection through
a URL to a script to ...." is the last thing you want to allow.
Simplicity in the headers is essential for fast servers and simple
clients like MOMspider.

>>  I believe that the
>> notion of document ownership is encountered frequently enough to
>> justify a special HTML element for that purpose.
> 
> Yes, we can, and maybe we will, but doesn't defining a special case
> because one form of a general one is used frequently enough run counter
> to everything you know about software engineering? :-)

Nope.  Generality is only as good as the abstraction upon which it is based.
I would say that broadening the document-document relationship abstraction
to include person-document and person-person relationships is not justified
by the application requirements (WWW) and creates unnecessary complexity.

> There is the case for generality. I agree it looks horrid.

But that's just it!  I consider links to be good for representing
relationships between documents, but horrid for representing relationships
between humans and documents.  Further, I think they are geared more towards
automatic creation than they are to being authored by humans.  Thus, authors
just avoid using them and the relationships they are intended to represent
cease to be meaningful.

In contrast, special HTML elements are easy to understand and remember by
authors and thus will be used more frequently and with better consistency.

>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 
>> Well, that should be enough to generate some healthy debate.
> 
> Yes.  It has brought up the general and important issue of what
> to do about metadata, which is useful too.
> 
> Here is a final idea.  The HTML spec can be user-friendly
> as people see it more often than the HTTP. So let the
> HTTP have a general relationship field.  Then specify an
> architectural form  (am I kidding?) to allow any DTD to
> specify the semantics of a relationship element in terms of the
> underlying relationship model.

Crikey!  It took me ten minutes just to understand what you meant
by that last sentence.  That would be an excellent way to maintain
both document understandability (for authors) and HTTP simplicity.
Is SGML flexible enough to allow definition of the semantics of an
element relationship within the DTD itself?  Such a thing would be
fine for clients (providing the server output remained consistent),
but would the server implementors want to do the translation?

> Tim

Thanks for being the devil's advocate ;).  And, while I'm at it, thanks
for the rest of your WWW work as well.

....Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                   (fielding@ics.uci.edu)