Re: Interpretation of RE

Arthur van Hoff (Arthur.Vanhoff@Eng.Sun.COM)
Mon, 10 Apr 95 19:44:26 EDT

Hi Keith,

> From: (Keith M. Corbett)
> >I'm trying to find out the correct interpretation of RE (newline) and
> >white space in html documents. According to the html spec you are
> >supposed to ignore the first and the last RE within the content of an
> >element. ...
> This must be based on clause 7.6.1 of the SGML standard, which states "If an
> RS in content is not interpreted as markup, it is ignored."

That is where I got my information from. This rule is ignored by most
browsers inside a PRE element though. Should it be?

> >If that is true, what is the correct interpretation of RE
> >iside a PRE content? For example:
> When I parse your example with nsgmls, the initial and trailing newlines
> within the B element are "swallowed".

What is nsgmls?

> >an HTML3 compliant parser. Could someone point me to more information
> >on the interpretation of white space in html documents?
> For a little light reading there's always The SGML Handbook. (:)

Yawn... I've tried. Anyway, nobody seems to take the SGML spec seriously.
This makes makes HTML a very poor choice as a document interchange format :^(

> Exoterica has published an interesting paper on their interpretation of the
> SGML standard with respect to record boundary handling. (For info send mail
> to

I've send them mail. Thanks.

Have fun,

Arthur van Hoff (
Sun Microsystems Inc, M/S UPAL02-301,
100 Hamilton Avenue, Palo Alto CA 94301, USA
Tel: +1 415 473 7242, Fax: +1 415 473 7104