Re: Suggestion: URL string-search syntax

"Rob Raisch, The Internet Company" <raisch@internet.com>

Mail folder: WWW Talk Apr 94-present
Next message: Marc VanHeyningen: "Security (was Re: Style Sheets for HTML)"
Previous message: Gavin Nicol: "Re: Suggestion: URL string-search syntax"
Maybe in reply to: Stephen D Crocker: "Re: Suggestion: URL string-search syntax "
Reply: Simon E Spero: "Re: Suggestion: URL string-search syntax "

Errors-To: listmaster@www0.cern.ch
Date: Tue, 31 May 1994 00:41:52 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <Pine.3.85.9405301143.A24828-0100000@hmmm>
Errors-To: listmaster@www0.cern.ch
Reply-To: raisch@internet.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Rob Raisch, The Internet Company" <raisch@internet.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Suggestion: URL string-search syntax 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Mime-Version: 1.0
Mime-Version: 1.0


Stephen, 

Indeed, the URN work answers only some of the important questions.  The
higher purpose here, I believe, is to answer this question: 

"What information do we need to make an "appropriate" retrieval decision?"

I've been thinking about this for some time and have a short list of
questions. 

<SOAPBOX ON>

Before I present them, I'd like to go on record once again and state my
belief that this is the single most important issue facing the global
Internet short of the exhaustion of the address space. 

As an on-line publishing enabler, we have seen what happens when an 
Internet repository achieves even the least level of notoriety.  

In some cases, this has had a positive effect and has been used to limit 
socially irresponsible behavior.   See "Canter and Siegal."

But consider the current barriers which traditional publishers face when
providing useful and popular information to their customers.  

I have had to turn away business -- or have lost potential business on the
basis of cost -- because we could not support the infrastructure required
without unfairly requiring the publisher to shoulder the entire cost of
delivery.  (And the business I lost did not go elsewhere. Currently no 
one is capable of supporting it.)

I say "unfairly" because we already have support for cost effective
information distribution in the real world.  There are trucking companies
and bookstores and fulfillment houses which exist and compete with each
other to keep costs down.  The publisher can leverage these existing 
services.

On-line, we have no such extant infrastructure and without it -- or a
technical infrastructure which supports the on-line analog -- publishing
on the global Internet will remain what it is now, an unsupportable and 
ineffective hack. 

It's a "chicken and egg" problem.  Without an established technical
infrastructure, the publisher cannot participate in anything other than a
cursory fashion, and without the publisher -- and its content -- there is
little incentive to provide this infrastructure. 

This is not simply a commercial issue. Interesting content is interesting 
irrespective of its pricing model.

And I fear that, should we look, we would find a full 40% of resource 
object retrieval across the Internet to be ill considered and wasteful.

<SOAPBOX OFF>

Ok, What do we need to know to make an "appropriate" retrieval decision?

First, let's assume the following:

	- a URN is -- AT THE VERY LEAST -- a reference to a collection of
	  zero or more URLs.

	- a URL uniquely identifies a single instance of a resource object

	- a resource object is some thing which can be retrieved from a
	  repository

	- a repository is a collection of resource objects which supports 
	  one or more methods of external retrieval

Our goal:

	- an appropriate retrieval decision must provide an optimal solution
	  in terms of the provider's resources, the consumer's use, and the 
	  use of the network infrastructure between provider and consumer.

The questions --

	(Consumer Use Questions)

	- if we retrieve this URL, can we use what we get?
	- ... do we have enough local cache to hold a copy?
	- ... is it in a form we can use (render/manipulate)?
	- ... ... is it in a language we can understand?
	- ... ... if it is non-text, can we use it without conversion?
	- ... ... ... can we convert it to a form we can use?
	- ... ... is it compressed?
	- ... ... ... can we uncompress it?

	- ... is there a fee to retrieve and use it?
	- ... ... can we afford it?
	- ... ... can we pay for it?

	(Repository Use Questions)

	- is the repository active?
	- ... do we have permission to use it?
	- ... does it support a retrieval service we can use?
	- ... is it free enough from use to acceptably fulfill the request?

	(Network Use Questions)

	- which repository is the closest?  
		(Where "close" is measured in terms of 
			network distance (hops),
			cost of bandwidth, 
			timeliness of response)

Who can answer these questions --

	(Easiest to Hardest)

	The consumer or her agent is most able to answer the Consumer Use 
	Questions.  The renderer "knows" what it can render, what it can 
	convert, what it can interpret.

	The repository is the only place to answer the Repository Use 
	Questions since it is the only comprehensive source of the answer.
	
	(It is possible to query the "main" repositories from some central
	service to monitor its load and accessability.  This implies a far
	larger intrenched infrastructure than we currently support.)

	Now, the hardest...

	I believe it is only truely appropriate to answer the Network Use 
	Questions at the consumer's site.  To be able to effectively 
	retrieve, I or my agent needs to know a hell of a lot about the 
	intervening network infrastructure than you might expect.  

Here's and example:  Assume I live in Los Angeles and know that O'Reilly 
and Associates has something really nifty to retrieve.  (Easy assumption, 
that. ;)

Now, I find that ORA has two servers, one in Cambridge MA and one in 
Sebastapool CA.  Which do I choose?

Well, (and this is obviously a stacked deck) Sebastapool would be the 
worst choice since ORA's Sebastapool office is connected to the network  
through Cambridge.

The user should never have to know any of this. 

As someone who provides mechanisms for publishers to provide content to 
consumers, I am EXTREMELY interested in exploring this problem and 
helping to provide a workable solution.

--  </rr>  Rob Raisch, The Internet Company