What URIs are and are not.

Tim Berners-Lee <timbl@www3.cern.ch>

Mail folder: WWW Talk Oct 93-present
Next message: Dave_Raggett: "Re: Please Tables in HTML+"
Previous message: Tim Berners-Lee: "First URI meeting notes"
Reply: Erik Huizer (SURFnet BV: "Re: What URIs are and are not."

Date: Wed, 3 Nov 93 18:04:28 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9311031704.AA02012@www3.cern.ch>
To: uri@bunyip.com, www-talk@nxoc01.cern.ch
Subject: What URIs are and are not.
Reply-To: timbl@nxoc01.cern.ch



Let me put down the *original* functional spec for URIs.
I fear that some people have gotten away from the original
requirement, and wanted to start designing things.

Listen good if you are a newcomer to the list or on the
IESG ;-)

There are many protocols on the net which imply a data model
which can be mapped onto some concept of "objects" and
addresses/names/identifiers/locators for those objects.

 Examples:	Protocol	Objects
		FTP		Directories
				Files
		SMTP		mail addresses
				mail messages
		NNTP		newsgroups
				articles
		HTTP		objects
		Gopher		menus
				documents
		DNS		hosts
				Mail eXchanges
		...

There will be many more future examples.

The characteristics of the objects and the properties of the
names.addresses/identifiers/locators vary and are defined by:-

	a. The protocol specification
	b. The way the protocol is actually used
	c. The conventions which are used by people
	
 (Example:  a.The FTP RFC implies that a directory object may contain  
files,
 in defining that NLST on a directory returns a list of files.
 b.The protocol is in fact often used using only A and I
 modes, and with the user/pass pair being "anonymous" and
 a mail address.  c. A convention is that ftp.x.x.x host names
 are not changed very often, but can change
  Hence the properties of 

 	ftp://info.cern.ch/pub/www
 are that it contains files, maybe listed by anonymous
 ftp to info.cern.ch, the files may change, but lifetimes
 will be of the order of year for directories.)
 

 There is for each protocol an implicit name/address/identifier
 space for the n/a/i s in th implicit data model.

I am trying to get across the great variety of schemes.

What you can do with an address/name/identifier depends also
on who and where you are and what facilities you have.  So
it is difficult to define.  (This is why I don't feel that the
URL/URN taxonomy debate has given us much).

HOWEVER, it is still extremely useful to have the concept of
the universal set of all identifier/name/addresses in all
schemes.
It is also useful to have a syntax for writing down the value

One cannot deny that it is useful, because WWW *uses* it.  This
is *not* to say that the WWW installed base prevents any bugs
in the URL spec from being fixed, but it is an existence proof
of the need.

The syntax for the universal set was called, in WWW, the URI
syntax, for Universal Resource Identifier.  The WG changed
"Universal" to "Uniform", but in doing so lost the important
significance of the Universality: that fact that, if you create
a name space, whatever its properties, I can give it a name
and map its syntax into acceptable UDI syntax.

Note that attepmts to make URIs a subset of another
name space are of couse possible but pointless by
definition.

The URI working group pointed out very sensibly that a
system of more persistent names was necessary.

Unfortunately, and this was the *big mistake*, we then
set about a taxonomy of all name spaces, to divide them into
URLs (of which they had several) and URNs (of which they used
none as no lookup method existed), and worse, to extend the
taxonomy to new schemes not yet invented,

I had hoped that a distributed persistwent name lookup
service would arise, but it didn't.  What did happen was
that great world-designing started and never finished.

Anyway, all existing schems have been called URLs, and
URN is a reserved name.

Since, there have been long discussion about, for example, whether
a news article id is a URL or a URN.  The IIIR community is trying
to retrofit a top-down design onto all existing systems. This
is foolish because

	1.	If you retrofit a design onto existing practice
		to make it clean you have to lie about existing
		practice.
	
	2.	To do a top-down design in this area won't work.
		We have to progress by a sequence of brilliant
		independent ideas.
		
	3.	If you manage to categorize all the existing schemes
		into a taxonomy you will only end up restricting the
		future dschemes into yoru current mind set.

What SHOULD we be doing?  Valid things to define and, therefore,
argue over are:

	1.	Interpretation of the implicit data model.
		For example, my interpretaion of the FTP model
		was that you browse directories, and the filenames
		are the names, and the files the addresses.
		The data type is guessed from the filename.
		This was my laying of a formal model onto the FTP
		protocol which didn't dedfine one.
		Others take the view that one doesn't browse a
		directory, one gets an address from a mail message,
		and there is information th the filename (etc)
		to tell you which transfer type to use.
		
		Obviously both are valid mappings, we need to chose
		and maybe use both.
		
	2.	Design of new data models. This is valid for HTTP
		and for URNs.
		
	3.	The mapping of names in the model onto a concrete
		string syntax. Malinly a question aof character sets,
		and settled, thank you.
		
The URL document talked about "requirements" on names
and addresses in different schemes. That was a mistake. It should
have talked about "characteristics" of names in different models.
We can only document these characteristics for current protocols,
we can't define them.  What we can do though is invent new schemes,
and in particular the fabled URN scheme.
Discussion of the relative merits of characteristics is
outside the bounds of the URL document.

In summary, the URL document

	- defines a Universal syntax for ANY past or future
	  names/addresses/identifiers

	- defines a spoecific mapping of name spaces
	  implicit in existing protocols into URI space.

The URIs defined for existing protocols are known as URLs
and they have the property that they map directly onto a
single protocol in each case.

If the URI WG wants to define something other than URIs
as defined above (and I hope in the document) then they
should first decide what to do with URIs.

Tim Berners-Lee
CERN