Re: Who can express URL syntax with BNF "Daniel W. Connolly" <email@example.com>
Date: Tue, 26 Apr 1994 19:44:42 +0200
From: "Daniel W. Connolly" <firstname.lastname@example.org>
To: Multiple recipients of list <email@example.com>
Subject: Re: Who can express URL syntax with BNF
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
In message <199404261347.AA23770@RA.DEPT.CS.YALE.EDU>, Stan Letovsky writes:
>>$Word = '[^/=;?#]*';
>>$scheme = $1 if s*^([A-Za-z0-9\.-]+):**; # @# syntax of scheme?
>>$hostport = &unescape($1) if s*^//($Word)**;
>>$fragment = &unescape($1) if s*#($Word)$**;
>>$search = &unescape($1) if s*\?($Word)$**;
>>$path = &unescape($_);
>This looks like perl, but I can't quite parse the regexps.
>Is this some variant perl dialect or alternate regexp syntax?
Sorry... it's just perl... Practical Extraction and Reporting Line-noise.
>Major question: This reminds me of an issue I strumbled across
>recently, about the possible coexistence of #label and ?query-string
>in the same URL.
Hmmm... from "Universal Resource Identifiers: BNF"
the following productions:
uri [ # fragmentid ]
scheme : path [ ? search ]
The fact that none of the following characters:
= | ; | / | # | ? | : | space
can occur in a fragmentid or search leaves no ambiguity that I can see.
(thoug it means that http://host.com:3000/ doesn't parse -- the colon
is no good).
We really need a spec that disambiguates cases like this. That's why
I'm building a test suite:
What's there is pretty old...
> I did some experiments with Mosaic 2.4 that
>suggested it did not recognize both in the same URL (ignored
>the label, I think, although it was ignoring labels in any
>script results when relative URLs were used, so I am not
>positive how it interprets this combination in all contexts.)
Ah yes... the old "see what Mosaic does" test. Hardly satisfying.
>Your regexps do not suggest any exclusion between #label
>and ?query; I can't tell if it imposes an order on them.
>Does anyone know what the official (? is there such a thing?)
>position is on the legality and syntax of combining #label
>and ?query in one URL?
Well... the URI working group completely balked on this sort of
thing... In their game, a URL is just scheme:opaque-string, with the
syntax of the string defined on a per-scheme basis.
In response to that, the WWW team sort of "took their marbles and went
home." Tim's editing a URI standard that has all the WWW mechanisms in it.
See Tim's collected notes:
Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010