Re: Draft: Universal Document Identifiers
timbl (Tim Berners-Lee)
Date: Mon, 2 Mar 92 12:36:33 GMT+0100
From: timbl (Tim Berners-Lee)
Message-id: <9203021136.AA14036@ nxoc01.cern.ch >
To: bcn@isi.edu (Clifford Neuman)
Subject: Re: Draft: Universal Document Identifiers
Cc: cni-arch@uccvma.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,
iafa@cc.mcgill.ca
Cliff,
Thanks for your input, with explanations of addressing in Prospero.
Prospero should certainly go into the document. Indeed, it seems to
fit in very well. The small differences raise some interesting
questions -- reactions off the top of my head follow, in the sequence
of you messsage.
Tim
_______________________________________________
> Date: Thu, 27 Feb 92 10:52:44 PST
> From: bcn@isi.edu (Clifford Neuman)
>
> I have glanced through your document on universal directory
> identifiers, and you seem to have left out Prospero.
Omission was from ignorance of the details you provide here and will
certainly be corrected. Prospero is very relevant.
> In particular, a Prospero link consists of two
> parts, a host name, and a name of the object on that host. The
> latter part is usually a path name, but in reality, it can be any
> string, including simply a unique ID. Thus, a Prospero link might
> look like
>
> TGO.ISI.EDU /a/b/c or GUM.ISI.EDU 27
The UDI syntax //TGO.ISI.EDU/a/b/c or //GUM.ISI.EDU/27 matches that
very well. I suggest the prefix "prospero:" for prospero addresses.
> A Prospero link has a few other fields as well, but perhaps less
> important. There is a type field for the hostname. It indicates
> whether the hostname is an Internet name or address, or perhaps
some
> other kind of name or address. Only one type is presently
supported
> (INTERNET-D) though, and that type includes Internet host names or
addresses, with or without an optional Internet UDP port.
>
> examples: TGO.ISI.EDU, TGO.ISI.EDU(191), 128.9.224.123, or
128.9.224.123(191)
The UDI scheme foresees these possibilities. These would map onto
//TGO.ISI.EDU/, //TGO.ISI.EDU:191/, //128.9.224.123/ and
/128.9.224.123:191/ respectively. The whole UDI of the file above
would be (if quoted out of the "prospero:" context),
prospero://TGO.ISI.EDU:191/a/b/c
We, also, wondered about how to extend the system when other
underlying protcols are used with the same higher-level protocol.
Suppose for example later one adds dial-up prospero. Should one write
prospero://dialup:+12025672654:200/a/b/c
or prospero-dialup:/+12025672654:200/a/b/c ?
My feeling is that the number of underlying network layers which have
complete world-wide coverage will remain low. Furthermore, one can
even imagine gateways there, so that those without X25 acces, say,
can go throuh some transport level gateway from TCP/IP if the need
arises. This suggests putting other low-level addresses into the
"host/port" field, encoded in some fashion. One would hope that there
will be less forms of transport service access point address than
there will be application layer protocols.
> The name relative to the host is also typed. Presently, the only
type
> supported is ASCII, but the type field is there just in case.
The rule we have used is to put type information, if part of the
link, into the path. protocols differ upon whether they regard it as
part of the link or it is returned when you try to retrieve the data.
In the latter case (which I prefer) it should not be in the UDI at
all.
> Three other fields are a version number, a unique ID, and a type.
The version number should I suggest be part of the path. Its
significance will tend to vary between servers. The trouble is, as
you say, noone has really put up a system dealing with multiple
versions. We imagined having hidden links from a document to its
previous, next and latest versions, and to a table of versions.
>The purpose of the unique ID is ... to provide a mechanism for
detecting when an object has been
> deleted and replaced with an object of the same name. In some
cases,
> it might be important to note that the object being retrieved is
not
> the same as the one to which the original link was made.
This is non-obvious. My feeling is that a unique id is a useful
thing, which I would regard as "header" information, ie information
you can ask the server for. Putting it into the link I'm not so sure
about. Suppose, for example, the retrieval goes through several
stages of pointers, being referenced by serveral servers. Do you want
to check that the final document, or the first link, was really the
same as the one you made the original link to?
> Binding to an access method is accomplished by sending
> a message to the Prospero server at the address in the link, and
> requesting the access method for the named object. The response
> includes a sequence of tokens, the first identifies the access
method,
> and the remainder identify the information specific to the access
> method (beyond that which already is part of the link). If you
> understand the access method, then you also know how to interpret
the
> remaining tokens.
That "late binding" is just the sort of "name-server" function which
I was talking about, and which for example x500 might also fit into.
So long as both the input and the output to the process are UDIs,
it's very flexible.
> For example, a response indicating access by anonymous FTP might be
>
> ANONYMOUS-FTP /pub/pfs/guest/README BINARY
We'd write that now as file:/(samehost)/pub/pfs/guest/README.
Currently, if the access protocol has to be specified, then the host
does too. It could default ot the host of the context of the UDI even
when protcol fields are different.
The "binary" flag is an interesting one and a perennial question. My
assumption was that if you know how to handle a file when you've got
it, then you must know how to transfer it. In practice with FTP both
mean that you have to have a table of file suffixes.
> Similar responses are supported for other methods, and a response
> might include more than one access method, in which case the
> application choose the method that best suits its needs.
Sounds fine.
> Now, back to the type field. One of the shortcomings of the
approach
> as described so far is that it requires a Prospero server to run on
> the system storing the object to be referenced. This shortcoming
is
> addressed by the external link. The type field in a Prospero link
> provides information on what can be done with the link. The three
> common types are FILE, DIRECTORY, and EXTERNAL. The links
described
> above were of type FILE. If a links type is directory, its
contents
> can be listed by contacting the Prospero server (i.e. the links in
the
> directory can be returned). If a links type is EXTERNAL, it means
> that the object should be accessed without contacting a Prospero
> server to obtain the access method (usually because a Prospero
server
> is not running on the target site). Instead, the access
information
> that would otherwise have been returned is encoded as part of the
> type. Thus for example the type of an external link to the file
mentioned above would be.
EXTERNAL(AFTP,BINARY)
Your "EXTERNAL" type is a pointer to a document in another naming
scheme which neat, and expandable -- I like it. The UDI syntax was
basically invented to allow one to to that, so that all these systems
can work together. Basically, type EXTERNAL(xxx) maps onto putting an
xxx: prefix on the UDI. In your example, it maps to giving a file:
reference.
You have, for prospero, the flag in the link as to whether the object
is a directory or a file. So does the Gopher. This is useful for
displaying different icons, etc. for the user. A snag is that if we
include anonymous FTP file systems, the NLIST command doesn't tell
you that information, so it doesn't map. You have to try to retrieve
it and if that fails, cd to it. If the flag is considered useful,
then we could use the converntion (of ls-F) that a/c/b/ is a
directory and a/b/c is a file. The trouble is, that you can't get
that information from an FTP server without assuming unix to parse a
long listing.
Do I _have_ to know in advance whether a Prospero item is a directory
or a file?
> Note that for external links using the AFTP or FTP method, the name
> field of the link contains the path name to be passed to FTP. For
> other access methods, the meaning of the field is defined by the
> particular access method to be used.
Yup - the UDI assumptions exactly.
> Anyway, I hope this adequate explains the form of Prospero
> identifiers, and I hope that you can fit it in to your proposed
> format.
>
> ~ Cliff
Thanks for a very clear explanation. It soudds as though Prospero
will fit very well into the format. I'll put it into the next draft
of the document.
- Tim