Re: Interpretation of RE

Keith M. Corbett (kmc@specialform.com)
Mon, 10 Apr 95 16:15:49 EDT

At 01:28 PM 4/10/95 EDT, Arthur van Hoff wrote:

>I'm trying to find out the correct interpretation of RE (newline) and
>white space in html documents. According to the html spec you are
>supposed to ignore the first and the last RE within the content of an
>element. ...

This must be based on clause 7.6.1 of the SGML standard, which states "If an
RS in content is not interpreted as markup, it is ignored."

>If that is true, what is the correct interpretation of RE
>iside a PRE content? For example:

When I parse your example with nsgmls, the initial and trailing newlines
within the B element are "swallowed".

>an HTML3 compliant parser. Could someone point me to more information
>on the interpretation of white space in html documents?

For a little light reading there's always The SGML Handbook. (:)

Exoterica has published an interesting paper on their interpretation of the
SGML standard with respect to record boundary handling. (For info send mail
to info@exoterica.com.)

-kmc