Re: Updated URI test suite; resolving some issues...

"Daniel W. Connolly" <connolly@hal.com>
Errors-To: listmaster@www0.cern.ch
Date: Thu, 17 Mar 1994 22:53:56 --100
Message-id: <9403172141.AA11217@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Updated URI test suite; resolving some issues... 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 5279
In message <9403171311.AA13912@dragget.hpl.hp.com>, Dave_Raggett writes:
>> I have been thinking about near-term ways to deploy URNs.
>
>Me too!
>
>Given a simple syntax and protocol we could rapidly deploy a URN->URL
>service.

Truth be told, I'm not all that interested in the URN->URL mapping
problem right now. I'm more interested in the properties of the
namespace itself.

The example I keep harping on is that I'd like to compose an article
with, for example, a reference to RFC1521, the MIME spec. I should be
able to point at:
	rfc:rfc1521.txt
with perhaps auxiliary pointers to:
	ftp://ds.internic.net/rfc/rfc1521.txt
with the understanding that if the reader has a local copy of this document,
s/he can use it in stead of ftping the file.

>Yes, but your assumption that you can keep "that copy forever" is wrong.

Not in this case: the relation
	(rfc number, octet-stream-of-text)
is in fact a function: once an RFC is published, it never changes.
That rfc number is bound for all time.

The same is true of email message ID's and USENET news article ID's --
once that message goes out on the net, that ID is forever associated
with that stream of octets. (with some exceptions: (1) Received:
headers and such, and (2) message IDs are specified to last about 2
years).

Not even messages with Expires: headers really change -- they just get
superceded.

Now filesystem namespaces don't work this way: the name
	/dir/file
may be bound to one octet string at one moment, and at another the
next. But we need only add a time element to this namespace to make it
work like the above:

	pathname x time -> octet-string

is a function.

And in practice, nobody really means to link to an inode -- they intend
to link to a piece of information that was once stored in that inode.
So it makes more sense to link to:

	http://xhost/yfile;date=19930317092345Z;md5=2l3k4j2lkj423l

i.e. it's a link to "the sequence of octets whose md5 signature
is 2l3k4j2lkj423l and which was retrieved from the yfile on the
xhost http server at 19930317092345Z."

To make sense of all this, we need to think of links not between
amorphous object, but between pieces of information. The unit of
information is the bit; put a bunch of them together, and you've got
an octet string. (of course the octets only make sense when
interpreted in the intended manner, in the assumed context...)

What we do today, with links to
	ftp://host/dir/file
is a heuristic approximation of the above: it says "I found some
info in /dir/file on the ftp server at host. Go look there. If you can
find something there and make sense of it, it's probably still relavent."

To me, the string
	ftp::/host/dir/file
identifies a _set_ of resources -- different elements at different
times. The Date:, Expires:, and If-Modified-Since: headers are steps
in the right direction in the HTTP protocol.

But nobody seems interested in giving authors and/or users control in
this area. Everybody seems to assume all documents are changing all
the time, and that everybody wants the latest version all the time.

I think it would be great if we could write:

	Comments on <A HREF="ftp://host/dir/file;date=19943002323Z">the
	magic cookie draft</A>

so that when the consumer followed the reference, s/he would get
notification if there was a newer version, or if the old one wasn't
available, etc.

Now then... we can't expect to be exact with every query -- we need
some slop for caching and stuff. So it seems that the
user/consumer/client should be able to say:

	"Get any copy of RFC822" (I know they're all the same)
	"Get any copy of mime.faq as of March 15, up to a month later"
		(I know it changes monthly)
	"Get any copy of mime.faq as of today, up to a month earlier"
		(I want the latest copy, and I know it changes
		monthly, so the odds are around 50% that any copy
		less than a month old is the same as the current one)
	"Get the March 15 version of foo.txt"
		(where the server is somehow able to zen the
		expiration period and guess that it's got the current
		one)
	"Get the current version of foo.txt"
		(as above)

this also begs the question of

	"Get any postscript version of RFC822"
	"Get any French version of RFC822"

and such... we need a system that encompases all these axes of the
namespace.

>I would like the URN syntax to support an optional set of attribute/values
>as a suffix. These act to subset the set of documents identified by the
>base URN. One approach for this is to use the existing "?" suffix for
>URLs, another is to include the selectors in [ brackets ]. What do
>you suggest for this?
>

This is included in the grammar and test suite I released:

path : /* void */ { printf("its a directory.\n"); }
        | pathname '/' path
        | pathname
        | params
        ;

params : keyword '=' value
        | keyword '=' value ';' params
        ;

searchStuff: /* void */ | words | params

>> The goal is to deploy the more sophisticated "URCs" or IAFA-templates or
>> whatever is a scalable, distributed fashion. In the short term, I'd like to
>> be able to compose documents with references like: ...
>
>This seems to be related to the goals behind some of the HyTime addressing
>concepts. I think we need to work at this to get a deeper understanding.

Exactly...

Dan