Re: Who can express URL syntax with BNF

"Daniel W. Connolly" <connolly@hal.com>
Errors-To: listmaster@www0.cern.ch
Date: Tue, 26 Apr 1994 19:44:42 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <9404261731.AA06733@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Who can express URL syntax with BNF 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
In message <199404261347.AA23770@RA.DEPT.CS.YALE.EDU>, Stan Letovsky writes:
>>
>>$Word = '[^/=;?#]*';
>>
>>$scheme = $1 if s*^([A-Za-z0-9\.-]+):**; # @# syntax of scheme?
>>$hostport = &unescape($1) if s*^//($Word)**;
>>$fragment = &unescape($1) if s*#($Word)$**;
>>$search = &unescape($1) if s*\?($Word)$**;
>>$path = &unescape($_);
>
>Minor question:
>This looks like perl, but I can't quite parse the regexps.
>Is this some variant perl dialect or alternate regexp syntax?

Sorry... it's just perl... Practical Extraction and Reporting Line-noise.

>Major question: This reminds me of an issue I strumbled across
>recently, about the possible coexistence of #label and ?query-string
>in the same URL.

Hmmm... from "Universal Resource Identifiers: BNF"
	http://info.cern.ch/hypertext/WWW/Addressing/URL/5_URI_BNF.html

the following productions:

fragmentaddress 
   uri [ # fragmentid ] 
uri 
   scheme : path [ ? search ] 

would suggest
	http://host.com/database?search#fragment
is kosher.

The fact that none of the following characters:

reserved 
   = | ; | / | # | ? | : | space 

can occur in a fragmentid or search leaves no ambiguity that I can see.

(thoug it means that http://host.com:3000/ doesn't parse -- the colon
is no good).

We really need a spec that disambiguates cases like this. That's why
I'm building a test suite:
	http://www.hal.com/%7Econnolly/url_test/

What's there is pretty old...

> I did some experiments with Mosaic 2.4 that
>suggested it did not recognize both in the same URL (ignored
>the label, I think, although it was ignoring labels in any
>script results when relative URLs were used, so I am not
>positive how it interprets this combination in all contexts.)

Ah yes... the old "see what Mosaic does" test. Hardly satisfying.

>Your regexps do not suggest any exclusion between #label
>and ?query; I can't tell if it imposes an order on them.
>Does anyone know what the official (? is there such a thing?)
>position is on the legality and syntax of combining #label
>and ?query in one URL?

Well... the URI working group completely balked on this sort of
thing...  In their game, a URL is just scheme:opaque-string, with the
syntax of the string defined on a per-scheme basis.

In response to that, the WWW team sort of "took their marbles and went
home." Tim's editing a URI standard that has all the WWW mechanisms in it.
See Tim's collected notes:

	http://info.cern.ch/hypertext/WWW/Addressing/Addressing.html

Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   (512) 834-9962 x5010
<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html