> From: kmc@specialform.com (Keith M. Corbett)
>
> >I'm trying to find out the correct interpretation of RE (newline) and
> >white space in html documents. According to the html spec you are
> >supposed to ignore the first and the last RE within the content of an
> >element. ...
>
> This must be based on clause 7.6.1 of the SGML standard, which states "If an
> RS in content is not interpreted as markup, it is ignored."
That is where I got my information from. This rule is ignored by most
browsers inside a PRE element though. Should it be?
> >If that is true, what is the correct interpretation of RE
> >iside a PRE content? For example:
>
> When I parse your example with nsgmls, the initial and trailing newlines
> within the B element are "swallowed".
What is nsgmls?
> >an HTML3 compliant parser. Could someone point me to more information
> >on the interpretation of white space in html documents?
>
> For a little light reading there's always The SGML Handbook. (:)
Yawn... I've tried. Anyway, nobody seems to take the SGML spec seriously.
This makes makes HTML a very poor choice as a document interchange format :^(
> Exoterica has published an interesting paper on their interpretation of the
> SGML standard with respect to record boundary handling. (For info send mail
> to info@exoterica.com.)
I've send them mail. Thanks.
Have fun,
Arthur van Hoff (avh@eng.sun.com)
http://java.sun.com/people/avh/
Sun Microsystems Inc, M/S UPAL02-301,
100 Hamilton Avenue, Palo Alto CA 94301, USA
Tel: +1 415 473 7242, Fax: +1 415 473 7104