>I'm trying to find out the correct interpretation of RE (newline) and
>white space in html documents. According to the html spec you are
>supposed to ignore the first and the last RE within the content of an
>element. ...
This must be based on clause 7.6.1 of the SGML standard, which states "If an
RS in content is not interpreted as markup, it is ignored."
>If that is true, what is the correct interpretation of RE
>iside a PRE content? For example:
When I parse your example with nsgmls, the initial and trailing newlines
within the B element are "swallowed".
>an HTML3 compliant parser. Could someone point me to more information
>on the interpretation of white space in html documents?
For a little light reading there's always The SGML Handbook. (:)
Exoterica has published an interesting paper on their interpretation of the
SGML standard with respect to record boundary handling. (For info send mail
to info@exoterica.com.)
-kmc