Registrar - a URN registry service

Rob Raisch <raisch@ora.com>
Date: Wed, 7 Jul 1993 17:37:29 -0400 (EDT)
From: Rob Raisch <raisch@ora.com>
Subject: Registrar - a URN registry service
To: uri@bunyip.com
Cc: www-talk@nxoc01.cern.ch, com-priv@psi.com
Message-id: <Pine.3.03.9307071729.A19174-f100000@amber.ora.com>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: RO

(NOTE: There are three mailing lists to which this is crossposted. 
Apologies beforehand for the extra bandwidth.  

o	The URI list is the most appropriate for the obvious reasons, 

o	the WWW-TALK list is included to generate some discussion regarding 
	the issues involved in the HTTP protocol and what role it should play 
	in the issues raised here, and

o	the COM-PRIV list is added to generate some discussion regarding the
	distinction between Intellectual Properties and Products.

Thanks for your patience.)

Here is registrar.  Comments are very welcome, and feel free to play.  The
next document explains the distribution of registrar servers (a few days),
and then a paper describing the sonar (repository availability) protocol.

-----------------------------------------------------------------------------
Quick synopsis:

	Registrar is a 'product' registry which serves various pieces of
information when given a unique URN.  One of the returned 'attributes' is
a 'product instance record' which contains a URL, content type, content
encoding, content size, access authority, billing authority, and cost records.

	Registrar is available on port 99, server 'ruby.ora.com' and
offers help information upon receiving a HELP command, eg.

	% telnet ruby.ora.com 99

	HELP
-----------------------------------------------------------------------------
	





              Registrar -- Resource Registration Service

                             Robert Raisch
                       manager, online services
                         O'Reilly & Associates
                 90 Sherman Street, Cambridge MA 02140



Assumptions:
-----------

     This document assumes that  the reader is conversant  in the form  and
function of Uniform Resource Locators (URL).  It would also be very helpful
if the reader were at least aware of the URI working group, and its efforts
to identify some of the issues addressed in this document.


Scope:
-----

     This document discusses an implementation for a Uniform Resource  Name
or Notation (URN)  server.   It describes  how the  URN can  map to  useful
information associated with a unique  product, and provide the location  of
instances of this product on the network.  This information can be used  to
automate retrieval of such products from multiple repositories.


Definitions:
-----------

               Instance -  an  existing specimen  of  an product  which  is
          indistinguishable from another instance of the same product based
          on the declaration of its owner.

               Product - a Product is any information declared by its owner
          to be unique and available.   That product might be available  in
          different formats  or  encodings, and  distributed  in  different
          repositories.

               Registrar - a  service which maps  a unique URN  to zero  or
          more Product Instance Records, which contain URLS.  It is also  a
          service which caches other important information which is  unique
          to a product.

               Uniform Resource  Name (URN)  -  a URN  is a  notation  that
          uniquely identifies a product.   The actual form of a  particular
          URN is up  to the  authority which  maintains responsibility  for
          that variety of URN, and  this document talks about one  possible
          form which meets the current needs of the author.  URNs take  the
          form: authority:opaque_data,  and  that the  authority  discussed
          here is called 'registrar.'

               Uniform Resource Locator (URL) -  a URL is that  information
          which allows the retrieval of a particular instance of a product.

               Product Attribute (PA) - a  Product Attribute is some  piece
          of information  which can  be attached  to the  declaration of  a
          product, and retrieved from a product registry server.

               Product Instance Record  (PIR) - a  Product Instance  Record
          contains the information particular to a specific instance of  an
          Product.


Issues:
------

     The Uniform Resource  Locator (URL) contains  information required  to
retrieve a single instance of a network resource.  It contains the name and
location of the instance, as well as the proper method used to retrieve it.

     While this is  useful information  once the decision  to retrieve  the
instance  has  been  made,  it  does  not  address  the  broader  and  more
complicated issues of whether or not we should retrieve the instance in the
first place,  and whether  or not  we can  use the  instance once  we  have
retrieved it.

     The information in the  URL is insufficient to  allow us to make  this
decision and so, we must look elsewhere to satisfy our needs.

     Thus, the primary concept  behind the REGISTRAR  server is to  provide
enough information  about  a particular  product  so  that a  number  of  a
decisions can be made regarding its accessibility and value.

     Currently, some of  the information required  to make an  'appropriate
retrieval decision' is available, but much is based on the assumption  that
the agent which makes the retrieval has this information before the  actual
URL is used.  In most cases, this is information which the  user possesses.
The user may understand that retrieving an instance from a '*.ac.uk' domain
would be less efficient than getting it from '*.berkeley.edu.' based on her
understanding or assumptions of the underlying structure of the network.

     Assuming that  we  have  already  made  the  decision  to  retrieve  a
particular product from the network, we will need the following information
to decide where we can retrieve it from, and whether or not we can use  the
instance once we retrieve it:

        - Is the instance available via a retrieval mechanism we can use?  
          (Instrumentality)

        - Is the instance available from a source (server) to which we have
          access?  (Availability)

	- Is the instance of a type which we can use? (Type)

        - Is the instance in a form which we can use? (Encoding)

	- Is the instance small enough to save and manipulate on our local
	  system? (Size)

	- Are we allowed to retrieve the instance? (Access)

	- If the instance is only available for a fee, can we pay for it?
          (Billing)

	- If the retrieval of this instance is billable, can we pay for it in
	  a currency which we use? (Payment)

Instrumentality -

     The instrumentality issue is addressed  by that portion of the  user's
application which allows or facilitates retrieval.  If the engine does  not
support retrieval using a particular  protocol or service, the  application
will, no doubt, inform the user.

Availability -

     When we  request the  instances  associated with  a  URN, we  will  be
presented with a  list of those  sites which store  those instances.   This
list, however does  not address whether  or not we  actually have  physical
access to any of the listed sites.

     Whether or not  a  particular instance of a  product is available,  in
terms of the  availability of the  repository site, is  an important  issue
relating to the  question of retrieval.   If the  instance is available  on
multiple repositories, we should  have access to  enough information to  be
able to make the 'best' retrieval decision.

     Best in  this  context refers  to  size of  repository  hardware  (its
'power'), its current load, how long  it takes to return a request  ('ping'
time), and how many network 'hops' a request must traverse.

     There is  another  protocol, SONAR  -  currently in  prototype - which
answers this issue.  We can  assume that SONAR provides enough  information
to the  REGISTRAR server  so that  when  it returns  a number  of   PRODUCT
INSTANCE RECORDS, those records  are in a sorted  order (best first,  worst
last) in terms of  their suitability as 'appropriate'  sites from which  to
retrieve a product.

     NOTE: SONAR is not meant  as a 'user' protocol,  (in the sense that  a
client program interacts with it, as an agent for a user), as REGISTRAR is.
Rather it  is  an  'inter-server' protocol,  used  only  between  REGISTRAR
servers.

Type -

     If an  instance is  in a  native format  or type  we do  not  support,
retrieving it is of little value.  Native format is that form which is used
directly by a application; eg.  ascii, postscript(tm), bitmap, etc.

Encoding -

     If an instance is in  an encoding which we  are unable to render  back
into its native type or format, the instance is of little use.  (Unless  we
can contract with a service which does the conversion for us?) This is  the
issue of  compression  and conversion  into  a form  more  appropriate  for
network delivery, eg.  unix-compress, uuencode, etc.

     There is no information  included in the URL  which deals with  either
issue.  Historically, the question of applicability of a certain  encoding,
or the availability of the required  program to uncompress an instance  has
been handled  by the  user  of the  application.   The  user has  made  the
decision to retrieve a  particular instance based on  her knowledge of  its
usefulness once it has been retrieved.

     This state of affairs is  becoming increasingly intolerable since  the
user can and should  no longer be called  upon to make these  distinctions.
As the user  base increases, (mostly  in the  low end of  network saavy  or
expertise), there will be more of a  need for agents or services which  can
make these decisions for the user.

Size -

     If an instance is too large to cache locally, and cannot be  retrieved
in pieces, it is of little value.  Information related to the size of each
particular instance is needed to make an appropriate retrieval decision.

Access -

     If we have decided that we can use an instance, we still must find out
whether or not we have permission to access that instance.

     (To be completed later.)

Billing -

     If we  have  permission  to  access an  instance,  assuming  that  the
instance is only available to those who  can pay for it, we must next  find
out if the billing authority which maintains control over the instance will
accept payment from us?

     (To be completed later.)

Payment -

     If we can pay, what will we pay?

     (To be completed later.)

	

The Uniform Resource Name:
-------------------------

     The Uniform  Resource  Name  is  a single,  unique  identifer  for  an
abstract product.

The following rules apply to URNs:

	-	Once created, a URN can never be destroyed.

	-	The actual encoding of the URN, (how it looks), is 
		completely immaterial to its function.  The actual
		content of a URN is that to which it refers.

	-	URNs are *never* created 'on the fly.'  A URN is provided
		as a pointer to a product when that product is registered
		with the authority responsible for its existance. Humans
		never make URNs, servers do.


Implementation:
--------------

     There is a prototype REGISTRAR server operating on ruby.ora.com,  port
99.  It supports all  of the features previously  identified, as well as  a
number of useful additions, such as keyword searching among products and  a
test interface to the local SONAR server.

     The server is a standard TCP session, similar to the 'finger' service,
and can be accessed via the 'telnet' program.

     The command/response structure is simple, and it should be quite  easy
to write clients for it.  Its general rules are

     o    Requests to the REGISTRAR server are in ASCII, and are  delimited
          with CR/LF.

     o    Requests to the  server are  either commands or  URN /  ATTRIBUTE
          pairs.

     o    Commands which the server understands are:

            HELP  --  returns a '.' delimited list of available commands.

            DEBUG --  toggles debugging output from the server.

            LIST  --  lists a '.' delimited list of registered URNs.

            SEARCH [keyword] --  returns a '.' delimited list of URNs which
                                 contain the keyword.

            QUERY [server]   --  returns a single line of information
                                 (from SONAR) which lists certain data
                                 about the mentioned server.  
                                 (EXPERIMENTAL - NOT ACTIVE)

            QUIT  --  Ends the session.

     o    URN  /  ATTRIBUTE  requests  are  used  to  retrieve   particular
          attributes from a product record.  Without an explicit ATTRIBUTE,
          the INSTANCE  attribute  is  assumed.    Thus,  these  are  valid
          requests

              registrar://ora/category/item:version
                       returns the INSTANCE attributes of the product

              registrar://ora/category/item:version CREATOR
                       returns the CREATOR attribute of the product

              registrar://ora/category/item:version ALL
                       returns all of the available attributes of the
                       product, including the DESCRIPTION property which
                       is otherwise unavailable.

     o    Responses all begin with a numeric, in the following form:

            0xx  --  Command failed.

            1xx  --  Command succeeded.

     o    Any response which  begins with a  dash ('-') is  a comment or  a
          debug or help message and can be safely ignored by the client.


Typical Sessions:
----------------

                    Request server HELP information
                    -------------------------------

	server: Registrar URN Service -- version 0.5  (raisch)

	client: HELP

	server: --DEBUG               -- enable copious output
	server: --LIST                -- list all registered URNS
	server: --SEARCH (keyword)+   -- search for a keyword
	server: --QUERY (server)+     -- query the status of a remote server
	server: --
	server: --<URN> ((ATTTRIBUTE)* | ALL) -- URN is in the form:
	server: --    authority://domain/category/item:version
	server: --          authority     = 'registrar' (this service)
	server: --          domain        = 'ora' (others available)
	server: --          category/item:version  = product designator
	server: --
	server: --    ATTRIBUTE is zero or more attributes (default: INSTANCE)
	server: --      ALL returns all defined attributes
	server: --        including DESCRIPTION (full text description)
	server: --          which is otherwise inaccessible
	server: --
	server: --    Format of the INSTANCE attribute:
	server: --    ( URL               --Uniform Resource Locator
	server: --      ENCODING          --TEXT,PS,TEX,GOPHER,HMTL,etc.
	server: --      COMPRESSION       --UNIX,ARC,ZIP,etc.
	server: --      SIZE              --in bytes
	server: --      ACCESS_AUTHORITY  --who grants permission to retrieve?
	server: --      BILLING_AUTHORITY --who do we pay?
	server: --      [COST]*           -- (MONETARY_SYSTEM AMOUNT)
	server: --    )                       Ex: (UK_POUNDS 15.0)
	server: --
	server: --QUIT      -- exit gracefully
	server: .

	client: QUIT

                  Request list of URNs on this server
                  -----------------------------------

	server: Registrar URN Service -- version 0.5  (raisch)

	client: LIST

	server: <registrar://ora/nutshell books/Learning GNU Emacs:2.0>
	server: <registrar://ora/magazine/Global Network Navigator:0.0>
	server: .

	client: QUIT

                      Request instances of a URN 
             (whitespace inserted to improve readability)
                       --------------------------

	server: Registrar URN Service -- version 0.5  (raisch)

	client: <registrar://ora/nutshell books/Learning GNU Emacs:2.0>

	server: INSTANCE:	(  gopher://gopher.ora.com/top_menu 
				   GOPHER 
				   NONE 
				   320
				)
	server: INSTANCE:	(  gopher://amber.ora.com/top_menu 
				   GOPHER 
				   NONE
				   320
				)
	server: INSTANCE:	(ftp://ftp.../published/oreilly/books/gnu.txt.Z 
				   TEXT 
				   UNIX 
				   16443 
				   NONE 
				   O'REILLY 
				    (US_DOLLARS 20.0) 
				    (CAN_DOLLARS 25.00) 
				    (UK_POUNDS 15.0)
			}
	server: INSTANCE:	(http://ftp.../published/oreilly/books/gnu.html 
				   HTML 
				   NONE 
				   32768 
				   NONE 
				   O'REILLY 
				    (US_DOLLARS 20.0) 
				    (CAN_DOLLARS 25.00) 
				    (UK_POUNDS 15.0)
                        )
	server: .

	client: QUIT


Format of a URN:
---------------

       authority://domain/category/name:version_major.version_minor

     o    authority is  the  descriptor which  defines  the format  of  the
          following fields.

     o    domain is a reference to  the responsible entity which  maintains
          all members of  a particular  name space.   (NOTE:  Based on  the
          transience of hostnames and domains in the Domain Name Service on
          the Internet, this is not to  be assumed to represent a  hostname
          or domain.  We assume that the actual host or hosts which support
          a particular  domain  would  be  kept in  a  'top  level'  domain
          authority, registered  with the  proper authority  (IANA),  which
          would be  queried  and cached  to  retrieve the  proper  host  to
          contact when a request  for information is  made to a  particular
          name space or domain of resposibility.)

     o    category is a method of defining seperate sub-name spaces  within
          a particular domain.

     o    name is the actual official name of the product in question, and

     o    version_major and version_minor reference a particular version of
          a unique product.  If the version is left off of the information
          request, the request is assumed to refer to the 'current' or most
          recent version of the product.

        example:

		  registrar://ora/nutshell books/Learning GNU Emacs:2.0
		  ^           ^   ^              ^                  ^ ^
		  |           |   |              |                  | |
	authority-+           |   |              |                  | |
	domain----------------+   |              |                  | |
	category------------------+              |                  | |
	name-------------------------------------+                  | |
	version_major-----------------------------------------------+ |
	version_minor-------------------------------------------------+


URN Record Format:
-----------------


        Write-Once Attributes

                NAME			{1}
                DOMAIN			{1}
                CATEGORY		{1}
		VERSION			{1}

		OWNER			{1}
                ADMINSTRATOR		{1}

		CREATED			{0,1}
                REGISTERED		{0,1}

                AUTHOR			{1,N}
                EDITOR			{0,N}
                PUBLISHER		{0,N}

		KEYWORDS		{1}
                SUMMARY			{0,1}
                DESCRIPTION		{0,1}

        Editable and User Defined Attributes

                LAST_ACCESS		{1}
                INSTANCE		{0,N}

                ANIMAL			{0,1}

	{1} 		= Only One
	{0,1} 		= Zero or One
	{0,N}		= Zero or More
	{1,N}		= One or More

	Example:
		NAME:           Learning GNU Emacs
		DOMAIN:         ora.com
		CATEGORY:       nutshell books
		VERSION:        2.0

		OWNER:		O'Reilly & Assoc. <ora@ora.com>
		ADMINISTRATOR:  Robert Raisch <raisch@ora.com>

		CREATED:	20 June 1993
		REGISTERED:	20 June 1993
		LAST_ACCESS:    20 June 1993

		AUTHOR:         Debra Cameron <debra@ora.com>
		AUTHOR:         Bill Rosenblatt <bill@ora.com>
		EDITOR:         Mike Loukides <mikel@ora.com>
		PUBLISHER:      O'Reilly & Assoc. <nuts@ora.com>

		ANIMAL:         Gnu

		KEYWORDS:       book tutorial editor gnu lisp
		SUMMARY:        Tutorial on the GNU Emacs Editor

		INSTANCE:       (	gopher://ora.com/top_menu 
					GOPHER 
					NONE 
					320
				)
		INSTANCE:       (	gopher://amber.ora.com/top_menu 
					GOPHER 
					NONE
					320
				)
		INSTANCE:       (	
	ftp://ftp.uu.net/published/oreilly/books/gnu.txt.Z     -- URL
					TEXT                   -- ENCODING
					UNIX                   -- COMPRESSION
					16443                  -- SIZE
					NONE                   -- ACCESS
					O'REILLY               -- BILLING
					(US_DOLLARS 20.0)      -- COST RECORD
					(CAN_DOLLARS 25.00) 
					(UK_POUNDS 15.0)
				)
		DESCRIPTION:

			[TEXT DELETED]
		.


Property Instance Record Format (PIR):

        URL                             - Uniform Resource Locator
        CONTENT TYPE                    - (See Instance Type)
        CONTENT ENCODING                - (See Instance Encodning)
        SIZE                            - Size of the Instance in Octets
        ACCESS_AUTHORITY                - (See Instance Access Authority)
        BILLING_AUTHORITY               - (See Instance Billing Authority)
        (MONETARY_SYSTEM COST)          - (See Instance Cost Record)


Instance Type:

(mime types are, of course, appropriate here.)

	ASCII				- Ascii Text
	PS				- Postscript(tm)
	TEX				- TeX
	NROFF				- Unix NROFF
	TROFF				- Unix TROFF
	EQN				- Unix EQN
	GIF				- Compuserve GIF, graphic
	TIFF				- Amiga TIFF, graphic
	JPEG				-
	GOPHER				- UMinn Gopher Menu
	WAIS				- WAIS query
	HTML				- WWW HTML document
	AIFF				-
	AU				-
	MPEG				-


Instance Encoding:

	COMPRESS			- Unix compress/uncompress
	GNU				- Gnuzip
	ARC				- 
	ZIP				- 
	HQX				-
	UUENCODE			-


Instance Access Authority:

	None Defined.  - O'REILLY is a reserved value.


Instance Billing Authority:

	None Defined.  - O'REILLY is a reserved value.


Instance Cost Record:

	MONETARY_SYSTEM		ex: US_DOLLAR, MEX_PESO, UK_POUND, CAN_DOLLAR
	AMOUNT			ex: 15.0

Comments:
--------

     The most important  issue addressed  in this  document has  to be  the
requirement of the current Internet community that individual  intellectual
properties be uniquely identifiable and that multiple instances of the same
product be indentifiable as such.    Without this capability, the  Internet
will continue to labor under the limitation that the user is unable to make
appropriate  retrieval  decisions,  and  will  continue  to  use  bandwidth
needlessly.  An example of this is the current assumption that two files on
the Internet  are  exactly the  same,  based on  the  implicit  information
carried in their names.  (foo.tar.Z and bar.arc *might* represent the exact
same information and the user has no method of telling.)

     While  there  is  considerable  work   being  done  to  identify   the
characteristics of "Intellectual Properties",  the author takes the  stance
that the whole concept of intellectual property is a legal construction  to
protect the rights of the author.

     Intellectual properties do not exist except as the right or license to
create products.  The owner of  an intellectual property is not making  the
property itself available by  publishing it on the  network.  The owner  or
the owner's agent  is making  products available  which are  based on  this
property.

     As such, whether or not a particular file or resource on the net is or
is not an intellectual property is not relevant to the issues presented  in
this paper.  

     Once a publisher makes one or more products available on the  network,
it is the  publisher's decision  whether or  not one  product differs  from
another, and any attempt to formalize this characteristic farther than this
is not useful to the task at hand.

     The other issue is the fact that there are a number of characteristics
of a  particular product which are required to make the retrieval decision.
If the file is encoded in Postscript(tm) and the local system does not have
the required technology  to render that  file, any retrieval  of that  file
would be in vain.   The assumption  that all the  important details can  be
implied from the filename is a  very inapproprate one, based on the  simple
fact that various systems have differing  methods of naming the same  file.
A Unix server  might  represent the file as  foo.tar.Z, while a DOS  system
might conceivably name the  same file 'footar.arc', or  a VMS system  might
name the same file 'foo_tar.Z,123'
-----------------------------------------------------------------------------