Re: Is this use of BASE kosher?

Paul Grosso (pbg@arbortext.com)
Tue, 1 Aug 95 09:59:50 EDT

> From: "Daniel W. Connolly" <connolly@beach.w3.org>
>
> So ' ' could be considered unsafe given the behaviour of Netscape, but
> as far as SGML is concerned, it's safe. Hmmm... actually, it's not,
> since in SGML attribute value literals, tabs, newlines, and multiple
> spaces are collapsed to single spaces to form the attribute value.

Assuming you're talking about attibutes whose declared value is CDATA,
as appears to be the case with, say, %URI, your comment about multiple
spaces being collapsed viz. 8879 is not quite correct.

>From clause 7.9.3, "An attribute value other than character data (CDATA)
is tokenized by replacing a sequence of SPACE characters with a single
SPACE character and ignoring leading or trailing SPACE characters."

According to Goldfarb's Handbook (page 331), the only interpretation
done for CDATA attributes "is accomplished by replacing any entity
references or character references within the literal and then
normalizing the resulting attribute value by throwing out any
entity ends and records starts and replacing any record end or
separator characters with a space. This work is all done before
the parser considers what the declared value of the attribute is.
At this point, if the declared value of the attribute is character
data (CDATA), derivation of the attribute value is complete."

A test with nsgmls (the SP parser) shows that multiple spaces in CDATA
attribute values are preserved.

paul

Paul Grosso
VP Research, ArborText, Inc.
and
Chief Technical Officer, SGML Open

Email: paul@arbortext.com