Re: Forms/CGI urls: '&' in HREF attributes

David Robinson (drtr1@cam.ac.uk)
Fri, 10 Feb 95 12:53:12 EST

>There's an unfortunate interaction between the x-www-urlencoded
>syntax for form data submission and SGML attribute value literal
>syntax. This came up shortly after I started running the validation
>service, and I thought we had discussed the problem, but it seems
>to be getting worse, and not better.
>...
>There are a couple ways to represent the string:
>
> http://foo.org/cgi-bin/do-something.pl?x=a&y=b
>
>as an attribute value literal:
>
> "http://foo.org/cgi-bin/do-something.pl?x=a&y=b"
> "http://foo.org/cgi-bin/do-something.pl?x=a"y=b"

I think you mean &

>but neither of those is interpreted correctly by existing browsers.

chimera 1.63 interprets both of these correctly.

>In the interest of interoperability, I'd like to move toward using ';'
>rather than (or in addition to) '&' to separate form name/value pairs.

A really painful change. I think it would cause less trouble if browsers
rejected invalid & escape sequences.

The problem is that it is difficult to simultaneously support unescaped and
escaped '&'. For example, chimera fails on an href of
"http://foo.org/cgi-bin/do-something.pl?x=a&lty=b"

So I cannot create a link to a searchable database with fields beginning
lt, amp, quot etc., that will work with all browsers.

>But folks will continue to copy-and-paste these form query URLs into
>their HTML without quoting the '&' chars. So eventually, browsers
>should start using ';' in the form encoding process in the first place
>(as well as supporting " inside attribute values!), and then the
>issue will go away.

Quoting '&' chars in URLs pasted into HTML is no harder than quoting
&, < and > in plain text pasted into HTML. Do folks not bother with that
either?

Use of '&' to separate fields is widespread amonst CGI script writers
and browsers. Use of '&' in hrefs is less common, most used by those who (like
myself) build WWW interfaces to structured databases.

Of course, the current CGI encoding is incompatible with the URL spec in
RFC 1738; so something has to change.
The encoding has & = + as reserved characters; the spec only has
/ ; ? as reserved characters in http searchparts; and + is forbidden from
being a reserved character in any URL scheme.

David Robinson. (drtr1@cam.ac.uk)