Re: RFC: Multi-Owner Maintenance robot (MOMspider)

Timothy Berners-Lee <timbl@dxcern.cern.ch>

Mail folder: WWW Talk Oct 93-present
Next message: Tony Sanders: "Re: International Document Server Support "
Previous message: Andreas Gehmeyr: "gnuinfo2html?"
In-reply-to: Roy T. Fielding: "RFC: Multi-Owner Maintenance robot (MOMspider)"
Reply: Roy T. Fielding: "Re: RFC: Multi-Owner Maintenance robot (MOMspider) "

Date: Mon, 6 Dec 1993 20:48:03 +0100 (MET)
From: Timothy Berners-Lee <timbl@dxcern.cern.ch>
Subject: Re: RFC: Multi-Owner Maintenance robot (MOMspider)
To: "Roy T. Fielding" <fielding@simplon.ics.uci.edu>
Cc: www-talk@nxoc01.cern.ch
In-reply-to: <9312060538.aa26370@paris.ics.uci.edu>
Message-id: <Pine.3.07.9312062002.D19717-d100000@dxcern.cern.ch>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Mon, 6 Dec 1993, Roy T. Fielding wrote:
> Each maintained HTML file would include an HTML comment of the following
> format as the first line of the file:
> 
> <!-- MOM Owner="AnyAuthorAlias" Expires="31 Dec 1993" -->
> 
...
> Note that since this added line is a comment, it will have no effect on
> existing servers or clients.

I agree with your later classification of this as a kludge, and
would much prefer to use either the existing LINK element
or new elements. Note that the HTML spec requires browsers to
ignore elements they do not understand -- that is, to treat
them as though the tags were not there. As this would leave
the contents within the HEAD of a message, it seems reasonable
to interpret this to mean that the contents (which would not
otherwise be allowed in the HEAD part) should be ignored.

This gives you a transition path from trying it out on a small
scale to standardising on it -- without breaking anything on the
way.

> HTTP servers which want to serve documents maintainable by MOMspider would
> need to parse the above MOMtag and send the information as headers in a
> response to any GET or HEAD request for that document.

I would like to draw your attention to the comments in the HTTP spec
about WWW_Link: and the isomorphism between HTTP object header feilds
and HTML head elements.  Therer is is suggested that a formal
relationship be made, automatically defining one set in terms of
the other.

The next point is that, although the two specs have the
metainformation in common, they should be kept separate.
This separation should include the MOMspider design.
Remember that GIFs have owners too -- and expiry dates, etc.

Supose we specify this metainformation in HTTP. I think that
it is really useful, and will put it in unless anyone objects.
Owner, that is, as Expires: is already in. You have to leave it
up to a server owner as to how he generates that field.
Nowhere does the HTTP spec say anything about how the feilds
are generated, only what they mean. For example, one could
take the uid or gid fields as a good guide. It is rather system
dependent.

So as a separate issue, we can add the fields for HTML, which
would probably be used by most servers (server admins) to generate
the HTTP headers.

> output is a bit of a kluge.  I would prefer to have official HTML
> metainformation elements for OWNER and EXPIRES which would be optionally
> specified within the HEAD element (similar to the TITLE element).
> Similarly, the HTTP response would include that metainformation as
> appropriate headers (note that this has already been suggested for
> the Expires header but I haven't seen any mention of how the expire
> date would be obtained from normal HTML files).

Again, it could be done in bulk, for examle by specifying that
anything in /internet-drafts expires 6 months after its creation
date.

A question: Suppose we have this info duplicated in the HTTP
headers and in HTML. What happens when a client PUTs a document
with conflicting information? Suppose the server stored all the
metainformation in a database.  Why ask the server to raed HTML
files all the way through, when for anything else (GIF, sound) etc
the server can just soak up the HTTP headers and treat the HTTP
body as opaque data?   Sounds to me as though the client
could be the one responsible for copying the metadata into the
HTML HEAD.  The HTTP metatdata (however it is stored) would be
the more fundamental.

> One point which I think may spark discussion is whether we should
> specify the Owner as a LINK relationship rather than as its own
> element.  I decided not to do so for reasons of efficiency and
> understandability. 

I'll play devil's advocate here.

> If the owner was specified as a LINK, MOMspider
> (and any similar clients) would have to parse through all the fields
> of every LINK header in order to find an owner relationship.

Hey, come on, it has to read all the header lines anyway to
look for OWNER. No more sweat for a machine to look for
LINK REL="OWNS"

> Furthermore, the document author would have to build a contrived
> reverse LINK relationship with fields normally used for document
> references

Not at all -- the LINK element is not normally used for document
references (the A element is usd for that normally).  The LINK
elementt was designed to define any two-ended relationship,
or binary predicate.  It was designed for relationships like

	Jack loves Jill
	Jill loves Jack
	Jack likes pie
	Jill makes pie
	Jack eats pie
	Jack adicted to pie
	Jack needs pie
	Jack needs Jill
	Jack demands pie
	Jill fears Jack
	Jack fears "Jack needs Jill"
	Jack hates Jill

Check out all the stuff on semantics in hypertext, like
the hairspray (keeps your ideas in places but I can't remember
which brand) from Halasz &co at PARC.

>  -- a concept which is counter to understandability and
> everything I know about software engineering.

From that point of vire, the useful thing about overloading LINK
is that a MOMspider (or anyone else) knows that a LINK has 
a parameter which is an object URI, and so can do quite a lot
with general machinery for all links. We can have general
routines like "find me all B such that A o B" rather than
special routnes "find me all B such that A owns B".

>  I believe that the
> notion of document ownership is encountered frequently enough to
> justify a special HTML element for that purpose.

Yes, we can, and maybe we will, but doesn't defining a special case
because one form of a general one is used frequently enough run counter
to everything you know about software engineering? :-)

There is the case for generality. I agree it looks horrid.

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Well, that should be enough to generate some healthy debate.

Yes.  It has brought up the general and important issue of what
to do about metadata, which is useful too.

Here is a final idea.  The HTML spec can be user-friendly
as people see it more often than the HTTP. So let the
HTTP have a general relationship field.  Then specify an
architectural form  (am I kidding?) to allow any DTD to
specify the semantics of a relationship element in terms of the
underlying relationship model.

Tim