Re: Draft: Universal Document Identifiers

peterd@expresso.cc.mcgill.ca (Peter Deutsch)
Message-id: <9203111745.AA02532@expresso.cc.mcgill.ca>
In-Reply-To: Tim Berners-Lee's message as of Mar 11, 12:07
From: peterd@expresso.cc.mcgill.ca (Peter Deutsch)
Date: Wed, 11 Mar 92 17:45:31 GMT-0:02
In-Reply-To: Tim Berners-Lee's message as of Mar 11, 12:07
X-Mailer: Mail User's Shell (6.5.6 6/30/89)
To: Tim Berners-Lee <timbl@nxoc01.cern.ch>,
        Larry Masinter <masinter@parc.xerox.com>
Subject: Re: Draft: Universal Document Identifiers
Cc: cni-arch@uccvma.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,
        iafa@cc.mcgill.ca
> From timbl@nxoc01.cern.ch Wed Mar 11 06:03:54 1992
> >> Peter Deutsch's message  <9203051920.AA14978@expresso.cc.mcgill.ca>
> >> Actually, Mike Schwartz has suggested using CRC checksums,
> 
> > From: Larry Masinter <masinter@parc.xerox.com>
> > You can do better than that by either:
> > a) use a good digital signature (MD5 or Snefru or ...). [...]
> > b) rely on something else that's unique, e.g., hostid + timestamp, ISO
> > DFR's DORs, Object Identifiers, etc. 
> 
> > We've been using 256-bit UDSNs and are happy with the scheme. I'm
> > hoping we'll have a writeup together before next week.
> 
> Peter, USDN is your term, so you decide what is and isn't one.

I want a UDSN to be something that lets me identify the
contents of a file and compare the contents of multiple
files to test for uniqueness. In the long run I'd also
like them to permit me to identify contents across
multiple encodings, but that's harder and I'm prepared to
wait for that.

I wouldn't be so bold as to try and decide what makes a
suitable UDSN but I hope that we can discuss the issue at
IETF next week (since we will have so many of the players
there) and arrive at some sort of consensus.

I can say what _I_ want them for, and hope that this is
something that would be useful to enough other people that
we can agree to deploy something soon. Certainly there are
a number of candidates, and Larry has named some of the
most likely. I think something that can be applied
retroactively (MD5?) would be preferable to something like
hostID and timestamp, which would be hard to retrofit to
the existing archive collections.

> However, a UDI I define to be something you can use to get the object. .  .
> .  .  . Knowing when you have a document that  
> you have the right document is a different problem, but with a
> good name space (like x500) you can do both operations.

I'm principally interested in UDSNs at this point to allow
comparisions between multiple items (perhaps found in disparate
environments). I don't see how the X.500 name space can
help me here (unless I'm misunderstanding what you mean?).
Certainly it seems that UDIs should help locate items.
That seems to be their raison d'etre.


				- peterd

--