Re: Form input and output (was: <pre>)

Brian Behlendorf (brian@wired.com)
Sat, 8 Oct 1994 07:12:44 +0100

On Fri, 7 Oct 1994, Daniel LaLiberte wrote:
> Another related issue is how a newline is sent by clients. It is not
> a big deal for (Unix) servers to always do the following (in Perl):
>
> # Must be done in this order.
> $articleContent =~ s/\015\012/\012/g; # Convert PC's CR/LF into just LF.
> $articleContent =~ s/\015/\012/g; # Convert Mac's CR into LF.
>
> But it is one extra annoying thing that always has to be done, so why
> not just have clients do it? Servers might have to convert back to
> their preferred standard, but there should be one intermediary standard.
>
> Another related issue is how large VALUE strings and input fields can
> be. Servers have to know how large of a VALUE string they can send,
> and clients have to know how large of an input string they can send.
> Similarly, how large can URLs be?

I'd like to share with you all how I spent my day today. It was spent
dealing with the fact that the above is far far from standard on even the
most common browsers, in particular NCSA's. I basically bent so far over
backwards to support bugs in a particular browser that I could tie my
shoes if I wanted.

I'm building what many of you have built as well, which is a conferencing
system using CGI scripts and forms. A user enters in a bunch of text,
submits it, they get back "how-this-would-look" to double-check that what
they are looking at looks nice and decent when rendered in a browser, and
then approves it forward. The document information is kept in hidden form
fields on that intermediate page.

The only translation I knew I had to do to safely store any arbitrary
text inside an attribute field in an INPUT tag was " and > - so for the
interim I translated them to their SGML entity equivalents of &quot; and
&gt;. lee@sq.com confirmed that according to the SGML DTD, anything
except another quote could be in that field. The only reason I was
escaping > was for lenient browsers that allow that to signify the end of
the INPUT tag.

So, this was what I expected to work fine everywhere:

<INPUT TYPE="hidden" NAME="content" VALUE="<P&gt;Nice day, isn't it?">

This worked like a charm on X Mosaic. MacMosaic, however, didn't - it
didn't escape the & character to %26 when submitting the form, so it got
quickly recognized as another field delimiter by my POST data parser.
So one option I figured was to do the hex translation for it, but that
didn't work either, as it properly translated the % character to %25!
I finally gave up and decided to use the # sign instead of the & sign
for psuedo-SGML-entities, and de-convert it on the other side before
writing to a file. So, finally, this worked:

<INPUT TYPE="hidden" NAME="content" VALUE="<P#gt;Nice day, isn't it?">

Well, almost. Now it turns out it croaks on the ' character too,
interpreting that as the end of the field, giving me

Content = <P#gt;Nice Day, isn

Aaarg! So, I get to have fun escaping that, too. Are there any more
characters MacMosaic doesn't do the right thing with?

Of course, to get this information I had to deal with a broken view-source
option, with crashes after approximately every 8 accesses which caused the
machine to reboot (on a sufficiently stable Quadra), and experiment many
times just to find where this bug was happening. I won't even get into the
CR/LF situation. Now I get to repeat this process with the other browsers we
want to list as acceptable - as it stands now there isn't a single free
browser for MS Windows that does both forms and user authentication without
serious problems. And yes, I have been working with the most recent
versions and even tried previous versions.

I have been saying this privately but have been holding back to say it here
for fear of offense, but it *has* to be said. The state of WWW software,
commercial and free, is pathetic. This isn't any one person's fault, isn't
any one institution's fault - but the fact that I had to go way way way out
of my way to support bugs in browsers that could take seconds to fix is
absolutely pathetic. I could care less about additional features in browsers
if the underlying systems are broken so badly. Bugs I can understand,
especially in alpha software. Persistant bugs I can forgive - I'm sure I'm
not the only one who has detected this as a bug and notified them, and some
bugs take awhile to fix. But the total lack of serious advancement in this
area since Spring of this year is pretty much embarrassing, and a total
turn-off for the major content providers who want to get into the game but
are scared off by the flakiness of it all. The browsers work 90% of the
time, but it's that 10% that debilitates us. Free software doesn't have to
be flaky - look at software put out by GNU. I'm playing the game here from a
different vantage point than most of you, and I think you should know that
the WWW is in pretty good danger of balkanization and fragmentation, if not
dissolution, if we don't get our collective hiney in gear, and soon.

Flame me if you want for these points, but I can pretty much anticipate your
responses - "no one's paying us to make free software" "it's a lot harder
than you think" "all the good guys got usurped by industry", etc. I wouldn't
be saying this if I felt those point negated my own. For NCSA's sake I'll
say their line is definitely impressive considering the majority of the work
is done by undergrads and grads. And Kim, I personally apologize for making
a big point about a small bug in an otherwise pretty good program, it is only
one instance of a bigger problem. And yes, "why don't I try programming a
better browser myself" is a flame expect, to which I counter I'm busting my
butt from the other end of the spectrum, building content and applications.
In fact, I'd love to fix code, if only source code were made available!

I guess the final conclusion is

W3O WHERE ARE YOU?

We *need* our own standards-setting body, one with links and references to
IETF but ultimately one that can be a lot more responsive and flexible and
not have to wait half a year for a standard to be implemented. A part of that
*must* be a stable of programmers dedicated to pulling this together. I KNOW
you will find institutions willing to help fund this - universities, research
institutes, AND publishers. The success of Mosaic has pretty much proved
what a great business giving away software can be.

I guess we'll find out in two weeks in Chicago. In the mean time check out
W3O's pages at CERN.

Heretically yours,

Brian

p.s. - what happened to w3o.org??