Re: Who can express URL syntax with BNF

"Daniel W. Connolly" <>
Date: Tue, 26 Apr 1994 19:44:42 +0200
Message-id: <>
Precedence: bulk
From: "Daniel W. Connolly" <>
To: Multiple recipients of list <>
Subject: Re: Who can express URL syntax with BNF 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
In message <199404261347.AA23770@RA.DEPT.CS.YALE.EDU>, Stan Letovsky writes:
>>$Word = '[^/=;?#]*';
>>$scheme = $1 if s*^([A-Za-z0-9\.-]+):**; # @# syntax of scheme?
>>$hostport = &unescape($1) if s*^//($Word)**;
>>$fragment = &unescape($1) if s*#($Word)$**;
>>$search = &unescape($1) if s*\?($Word)$**;
>>$path = &unescape($_);
>Minor question:
>This looks like perl, but I can't quite parse the regexps.
>Is this some variant perl dialect or alternate regexp syntax?

Sorry... it's just perl... Practical Extraction and Reporting Line-noise.

>Major question: This reminds me of an issue I strumbled across
>recently, about the possible coexistence of #label and ?query-string
>in the same URL.

Hmmm... from "Universal Resource Identifiers: BNF"

the following productions:

   uri [ # fragmentid ] 
   scheme : path [ ? search ] 

would suggest
is kosher.

The fact that none of the following characters:

   = | ; | / | # | ? | : | space 

can occur in a fragmentid or search leaves no ambiguity that I can see.

(thoug it means that doesn't parse -- the colon
is no good).

We really need a spec that disambiguates cases like this. That's why
I'm building a test suite:

What's there is pretty old...

> I did some experiments with Mosaic 2.4 that
>suggested it did not recognize both in the same URL (ignored
>the label, I think, although it was ignoring labels in any
>script results when relative URLs were used, so I am not
>positive how it interprets this combination in all contexts.)

Ah yes... the old "see what Mosaic does" test. Hardly satisfying.

>Your regexps do not suggest any exclusion between #label
>and ?query; I can't tell if it imposes an order on them.
>Does anyone know what the official (? is there such a thing?)
>position is on the legality and syntax of combining #label
>and ?query in one URL?

Well... the URI working group completely balked on this sort of
thing...  In their game, a URL is just scheme:opaque-string, with the
syntax of the string defined on a per-scheme basis.

In response to that, the WWW team sort of "took their marbles and went
home." Tim's editing a URI standard that has all the WWW mechanisms in it.
See Tim's collected notes:

Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   (512) 834-9962 x5010