Openning the WAIS document-id syntax
Jonny Goldman <jonathan@think.com>
Date: Thu, 26 Mar 92 09:47:34 PST
Message-id: <9203261747.AA00262@philo.quake.think.com.>
From: Jonny Goldman <jonathan@think.com>
Sender: jonathan@quake.think.com
To: timbl@nxoc01.cern.ch
Cc: www-talk@nxoc01.cern.ch, wais-talk@think.com
In-reply-to: Tim Berners-Lee's message of Thu, 26 Mar 92 15:25:12 GMT+0100 <9203261425.AA23337@ nxoc01.cern.ch >
Subject: Openning the WAIS document-id syntax
First, I'd like to point out the WAIS-FTP doesn't mean a client or server
understands FTP protocol. It's simply a customized server that functions
like FTP (but is read-only). It's mainly an experiment in modifying
servers and providing services under WAIS.
Date: Thu, 26 Mar 92 15:25:12 GMT+0100
From: timbl@nxoc01.cern.ch (Tim Berners-Lee)
[...]
The data model of WAIS (documents in databases) could be deconstrained
to allow documents themselves to be or contain lists of documents, and
for lists of documents to point to things other than documents in the
same database.
I take it you're suggesting a new TYPE for a document: Derived types? In a
sense the catalog is one of these.
This is the way the second part can work. Normally, a search returns a
list of doc-ids, each one (basically) like
/usr/local/lib/wais/mydatabase/fred/myfile.txt
which is in fact a filename.
Let me also point out that this is just the method used in the sample
server. The CM server does not return DocID's that are derived from
filenames.
In fact, DocID's are "any"s, and that means they can have anything in them,
so long as the server understands how to return a specified amount of data
to a client when presented a DocID and a range.
There's a load of other stuff in there which we can ignore for now.
What a WAIS search needs to be able to do, when you are pointing to
files, is to return a pointer to a file in FTP say. We do that in two
steps.
I don't agree. I think the server should do the retrieval. The client
should not have to know anything about the REAL location of the document.
More on that below.
First, we recognise that that id is local to the conext of a wais server
on host myhost and port myport. When the server returns that string, the
client uses knowledge of the context in which it was quoted to exapnd
that to
wais://myhost.dom.net:myport/usr/local/lib/wais/mydatabase/fred/myfile.txt
This is a refernece you can quote to anyone as it makes sense anywhere.
No context. I called it a UDI but we'll have to change the name.
Document Access Token maybe. It's like Brewster's proposal but
extendable to other protocols. [Yes, WAIS is a good protocol but there
are others. Including name servers and directories which will be needed
for long-lived but movable documents.]
This is a good idea, but I feel rather strongly that we should be very
careful in overloading the protocol. Specifying a syntax for DocID's is
one way of overloading the protocol. Standardizing types is another.
Now suppose one day a server returns a doc-id INCLUDING the protocol,
host, etc. For example, your WAIS FTP engine (like the ARCHIE WAIS)
returns what are basically pointers to files. Just now, because of the
constraints of the model, it has to return a part of a file within the
database. Suppose we change that, so that in your case it just returns a
doc-id which specifies anonymous ftp access, like:
WAIS-FTP doesn't return pointers to remote files. It returns local DocIDs
for use in retrieving a file local to the server. Archie WAIS (and
ftpable-readmes) returns these pointers. That's a different story.
Now for a small discussion of WAIS DocID's. So far WAIS DocID's have only a
few fields:
typedef struct DocID{
any* originalServer;
any* originalDatabase;
any* originalLocalID;
any* distributorServer;
any* distributorDatabase;
any* distributorLocalID;
long copyrightDisposition;
} DocID;
The part you refer to is just the LocalID part. If you look at some of the
DocID's returned by the serial server, you'll see the other fields are
filled in (though the Server fields don't contain much useful information -
it's that part we were trying to standardize with the doc-id proposal).
file://otherhost.com/pub/doc/mydoc.txt
The client has a general retrieval engine which can accept doc-ids in
many domains -- not just WAIS. That allows it to go out over a different
protocol to retrieve the object.
There are two ways to handle this, of course. Either the client or the
server could do the retrieval. I believe the server should handle the
protocol part (if the document is stored on some FTP server somewhere, the
WAIS server can just fetch the file, and return it to the client). This
reduces client complexity. I have no objection to specifying the
protocol/server in the DocID (perhaps with another field), but we must
standardize the meanings.
This is the way WWW and Gopher work. They are open systems -- you can
link into any other system within reason. That's why the fuss about
universal document identifiers. Maybe the WAIS people would to
incorporate them -- that is, just make sure that the normal WAIS server
return things which are -- like the one above -- special cases of the
more general syntax.
I haven't had much comment from the WAIS side about the UDIs, but I'd
like to have some. (file://info.cern.ch/pub/www/doc/udi1.ps was
background for the IETF discussions.) We plan a small working group
hacking out the details before an RFC is submitted.
Come up with an RFC, and we'll try to abide by it. I'd like to caution you
against overloaded strings. We've got enough of them already.
For a start, I'd suggest we use the originalServer as the identifier for
the HOST, and the originalDatabase can inform us of the protocol.
- Jonny G