Re: HTML draft - clarification of quoted string processing

marca@ncsa.uiuc.edu (Marc Andreessen)
Date: Sun, 5 Dec 93 02:31:47 -0800
From: marca@ncsa.uiuc.edu (Marc Andreessen)
Message-id: <9312051031.AA16590@wintermute.ncsa.uiuc.edu>
To: Dave_Raggett <dsr@hplb.hpl.hp.com>
Cc: cheung@eplrx7.es.dupont.com, www-talk@nxoc01.cern.ch
Subject: Re: HTML draft - clarification of quoted string processing
In-reply-to: <9312031754.AA27754@manuel.hpl.hp.com>
References: <9312031754.AA27754@manuel.hpl.hp.com>
Dave_Raggett writes:
> Bryan Cheung writes:
> 
> > I may just be blind, but I don't see a place in the HTML spec which 
> > describes what is supposed to happen to special characters inside quoted
> > strings. Consider an HTML statement such as:
> 
> > <form method=post action="/htbin-post/banner hello > foobar">

Heavily illegal.  The special characters should be encoded, as usual,
as %xx.  (This is because ACTION specifies a URL.)

> >  ...form goes here
> > </form>
> 
> > My question relates to how the i/o redirection character (or any special
> > character) is to be treated when used within quotes inside of a standard
> > HTML directive. Should special characters be completely protected when
> > quoted inside of a directive?? Does it make sense to specify that escapes
> > such as &gt; be used within quoted strings? Where should this go in the
> > spec? (I looked for it, and can't find it - please point me there if I
> > missed it.
> 
> Page 331 of Goldfarb's SGML Handbook says that parsers derive the attribute
> value from the attribute value literal (the stuff between the quote marks)
> by replacing any entity references or character references within the literal
> and then normalising by replacing any contiguous whitespace by a single
> space character. Note you can use " or ' as quote marks for attribute value
> literals.
> 
> Thanks for pointing out this topic - it is rather obscure and clearly needs
> to be included in the HTML+ spec. I can garantee that most browsers are
> currently doing the wrong thing for attributes!

Bleah.  Let's be more restrictive.  An encoding method exists for URLs
anyway; other attributes/values can be limited to reasonable characters.
This keeps things simple.

Marc