Re: URL decisions in Seattle, & changes

"Daniel W. Connolly" <connolly@hal.com>
Errors-To: listmaster@www0.cern.ch
Date: Thu, 31 Mar 1994 18:03:25 --100
Message-id: <9403311600.AA14222@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: URL decisions in Seattle, & changes 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 2018
In message <94Mar30.213053pst.2732@golden.parc.xerox.com>, Larry Masinter write
s:
>>> In future schemes, will '/' and '%2F' mean the same thing or different
>>> things? I gather that the answer is "it depends." This rules out the
>>> idea of having one algorithm for reducing a URI to canonical form.  So
>>> the question of whether
>
>Well, in fact, the 'canonical' form for any URL must necessarily be
>protocol specific.

I still disagree. It is possible to specify a canonical form for URLs
independent of scheme. The quoting scheme described by Tim and myself
(and implemented in HTParse.c and tested in my test suite...) does
just this.

> This is true for the default port (e.g., that
>http://host:80/ is the same as http://host/ but gopher: has 70 as a
>default port, etc.)

Given the definition of equality I proposed, http://host:80/ is
different from http://host/. The fact that they resolve to the same
thing is not part of the URL spec.

> that the same host might have multiple DNS names,
>or that some FTP servers allow case insensitive file names, any number
>of actual equivalences, symbolic links, etc.

None of these things should be part of the URL spec. But things
that are used in practice today, i.e. the significance of ?, /,
and %xx, should be.

>In the grand scheme of things, if you treat "/" and "%2F" as
>different, then at most you'll treat a few things as 'different' that
>are really the 'same', but in fact, this will be an insignificant
>amount compared to the other kinds of duplications.

In the grand scheme of things, the question is whether there's any
common structure to the "parameter package" of a URL. It sounds like
the decision is that there is not, even though this contradicts current
practice.

So the grammar for URLs is just:

	URL : IALPHA ':' CHARS
		;

with terminals:
	IALPHA =~ /[a-zA-z][a-zA-Z0-9-_]*/;
	CHARS =~ /[^ <>]*/;

I'm interested to know if the most widely deployed URL implementations
(www, Mosaix, ...) are going to change to conform to this.

Dan