HTML 2.0 LAST CALL: <TEXTAREA> content

Ka-Ping Yee (kryee@calum.csclub.uwaterloo.ca)
Sat, 3 Jun 95 23:03:44 EDT

The HTML 2.0 specification document currently contains only the following
statements about the content of the TEXTAREA element:

> The content of the TEXTAREA element is the field's initial value.
>
> Typically, the ROWS and COLS attributes determine the visible
> dimension of the field in characters. The field is typically rendered
> in a fixed-width font. HTML user agents should allow text to extend
> beyond these limits by scrolling as needed.

I believe this is not quite complete, since it doesn't indicate how the
content is to be treated (i.e. as preformatted PCDATA).

Correct me if i'm wrong, but i think <TEXTAREA> content leaves tabs literal
rather than expanding them to spaces the way the section on <PRE> describes:

> * The horizontal tab character (encoded in `US-ASCII' and
> `ISO-8859-1' as decimal 9) must be interpreted as the smallest
> positive nonzero number of spaces which will leave the number of
> characters so far on the line as a multiple of 8.

If this is true, then my suggestion to replace the first sentence describing
<TEXTAREA> is something like this:

The content of the TEXTAREA element is the field's initial value.
Within this content, spaces, line breaks, and tabs are to be taken
literally as part of the field data, while entity references and
tags within this content are parsed and interpreted.

Perhaps it would also be prudent to add a note such as:

Although tags within the content of a TEXTAREA element are allowed,
use of markup here is discouraged as it has no defined meaning.

Or does it? I really don't understand how tags are supposed to behave
within <TEXTAREA> content, but they are indeed parsed, since the DTD says
the content is PCDATA (but excludes TEXTAREA, INPUT, and i think SELECT).

For consistency, i also suggest that the opening paragraph of section 5,
"Characters, Words, and Paragraphs" be changed from

> An HTML user agent should present the body of an HTML document
> as a collection of typeset paragraphs and preformatted text. Except
> for the PRE element, each block structuring element is regarded
> as a paragraph by taking the data characters in its content and the
> content of its descendant elements, concatenating them, and splitting
> the result into words, separated by space, tab, or record end
> characters (and perhaps hyphen characters). The sequence of words is
> typeset as a paragraph by breaking it into lines.

by replacing "Except for the PRE element" with "Except for the PRE and
TEXTAREA elements".

Perhaps it might even be informative to add a note like:

Some historical implementations incorrectly treat <TEXTAREA> as an
element on its own rather than a block-structuring element and use
the value of a VALUE attribute as initial data for the entry field.

(I believe Mosaic used to do this, and on certain platforms it still
does?) Thanks for reading.

Ping (Ka-Ping Yee): 2B Computer Engineering, University of Waterloo, Canada
kryee@csclub.uwaterloo.ca | 62A Churchill St, Waterloo, N2L 2X2, 519 886-3947
CWSF 89, 90, 92; LIYSF 90, 91; Shad Valley 92; DOE 93; IMO 91, 93; ACMICPC 94
! Ayukawa Madoka ! Hiyama Hikaru ! Tendou Akane ! Hayakawa Moemi ! Amano Ai !