Re: Forms/CGI urls: '&' in HREF attributes

wmperry@spry.com
Thu, 9 Feb 95 15:00:07 EST

Daniel W. Connolly writes:
>
> There's an unfortunate interaction between the x-www-urlencoded
> syntax for form data submission and SGML attribute value literal
> syntax. This came up shortly after I started running the validation
> service, and I thought we had discussed the problem, but it seems
> to be getting worse, and not better.
>
> An example of the problem:
>
> Given this document:
>
> ===============================================
> <!doctype html public "-//IETF//DTD HTML//EN">
> <title>testing & in HREF</title>
>
> <p>Here we go:
> <a href="http://foo.org/cgi-bin/do-something.pl?x=a&y=b">link</a>
> ===============================================
>
> Trying to validate it yields:
>
> ===============================================
> connolly@ulua ../connolly[1114] html-validate test.html
> sgmls: SGML error at test.html, line 5 at "y":
> No declaration for entity "y"; reference ignored
> ===============================================
>
> Section 7.9.3 "Attribute Value Specification" of the SGML standard
> says:
>
> An attribute value literal is interpreted as an attribute value
> by replacing references within it, ignoring Ee and RS, and replacing
> an RE or SEPCHAR with a SPACE.
>
> So the attribute value literal:
>
> "http://foo.org/cgi-bin/do-something.pl?x=a&y=b"
>
> has an error it it: &y references an undeclared entity.
>
>
> This should definitely go in as a NOTE: or something in the HTML spec,
> and perhaps it's worth mentioning in the URL spec (though that's
> stretching it).
>
>
> There are a couple ways to represent the string:
>
> http://foo.org/cgi-bin/do-something.pl?x=a&y=b
>
> as an attribute value literal:
>
> "http://foo.org/cgi-bin/do-something.pl?x=a&amp;y=b"
> "http://foo.org/cgi-bin/do-something.pl?x=a&#34;y=b"
>
> but neither of those is interpreted correctly by existing browsers.

Well, not to be nitpicky, but it is supported by several (emacs-w3, AIR
Mosaic, Secure NCSA Mosaic, Internetworks, and I think OmniWeb. I don't
have access to any macs right this second, or I'd check a few mac browsers
as well.

SpyGlass Enhanced Mosaic (as of 1.2), Netscape 1.0N, and WinWeb (latest off
of ftp.einet.net) definitely don't support it.

> In the interest of interoperability, I'd like to move toward using ';'
> rather than (or in addition to) '&' to separate form name/value pairs.
>
> That way, the URL for this query can be:
>
> http://foo.org/cgi-bin/do-something.pl?x=a;y=b
>
> You can put this in an HTML document by writing:
>
> HREF="http://foo.org/cgi-bin/do-something.pl?x=a;y=b"
>
> A quick check through the Mosaic 2.4 source code shows that a ';'
> characetr in an input field _will_ be %xx-ified, so this doesn't
> introduce any ambiguity.
>
> The way to start the transition is to enhance cgi scripts to support
> separating form values by ';' as well as '&'. Then folks that want to
> validate their HTML can change '&' to ';' in their HREF
> attributes.
>
> But folks will continue to copy-and-paste these form query URLs into
> their HTML without quoting the '&' chars. So eventually, browsers
> should start using ';' in the form encoding process in the first place
> (as well as supporting &#34; inside attribute values!), and then the
> issue will go away.
>
> There's something of a chicken-and-egg problem here: who will support
> the first browser to use ';' rather than '&' to encode form stuff?
> That won't happen until the vast majority of CGI scripts have been
> enhanced to support it. And that might not won't happen until folks
> that want to validate their HTML start complaining. But it's a really
> cheap fix on the CGI side, no?

Should be just as fast on the CGI-side to check for ; as &.

-Bill P.