Re: Who can express URL syntax with BNF

"Daniel W. Connolly" <connolly@hal.com>
Errors-To: listmaster@www0.cern.ch
Date: Mon, 25 Apr 1994 18:59:32 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <9404251646.AA04305@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Who can express URL syntax with BNF 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas

I made some attempts to write a yacc grammar for URL's, but it wasn't
a very valuable excercise... regular expression matching works pretty well;
e.g.:

	$Word = '[^/=;?#]*';

	$scheme = $1 if s*^([A-Za-z0-9\.-]+):**; # @# syntax of scheme?
	$hostport = &unescape($1) if s*^//($Word)**;
	$fragment = &unescape($1) if s*#($Word)$**;
	$search = &unescape($1) if s*\?($Word)$**;
	$path = &unescape($_);

The URL spec has gotten really watered down by the IIIR working
group...  the WWW aplication needs more structure than the URL spec
gives, so there's a URI spec that Tim has written to explain the rest.
Anyway... I'm still not sure all the syntactical issues have been
completely hammered out.

In message <447*/S=hille/OU=rz/OU=informatik/PRMD=uni-hamburg/ADMD=d400/C=de/@M
HS>, Gunter writes:
>No, I did not read the www-talk archive. But the parser of my HTTP server
>for Windows will be an ugly piece of code, if it cannot decide what to do:
>
>GET /1234?Name=foo&Age=21 HTTP/1.0     seems to be a form, handled specially
>GET /1234?Name=foo HTTP/1.0            is a form or a textsearch
>GET /1234?Name+foo HTTP/1.0            is a textsearch
>GET /1234?Name%20foo HTTP/1.0          is a textsearch as well?
>GET /1234?777,888 HTTP/1.0             is a specejump (or textsearch?)
>
>So, how to decide whether to process forms, do a textsearch or do a spacejump.

Many of these syntaxes are opaque to the URI syntax (spacejump is just
an ordinary search URL, as far as the spec goes). They're
server-specific mechanisms that some clients know how to exploit.

Anyway... the way to disambiguate is to tell the client to give you
more info in the path. For example, have the client send:

GET /1234/textsearch?Name%20foo HTTP/1.0
GET /1234/spacejump?777,888 HTTP/1.0

Hmmm... actually, I think you've found a real problem: what if
somebody wants to put both types of queries in one document? e.g. what
if a node has both <ISINDEX> and <IMG ISMAP>, and the user enters
"777,88" in the search text area? (Forms is not so much of a problem:
you can put ACTION="/disambiguating/path" and give the server enough
info to decide it's a form).

It seems that this actually does need to be part of the protocol, and
not just an application-specific use of the protocol.

>HTTP protocol states that spacejump and textsearch methods are done via GET.

Where? Why are spacejump and textsearch part of the protocol at all?
I thought they were just applications of the protocol?

>Why don't we use SPACEJUMP or TEXTSEARCH to disambiguate our syntax?

Because nobody thought it was necessary... looks like they were wrong.

Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   (512) 834-9962 x5010
<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html