Re: The <PRE> tag

Dan Connolly <connolly@pixel.convex.com>

Mail folder: WWW Talk 1992 Archives
Next message: Dan Connolly: "more on PRE tag"
Previous message: Tim Berners-Lee: "Re: The <PRE> tag"
In-reply-to: Tim Berners-Lee: "Re: The <PRE> tag"

Message-id: <9211241603.AA07624@pixel.convex.com>
To: timbl@nxoc01.cern.ch
Cc: Rik Harris <rik@daneel.rdt.monash.edu.au>, www-talk@nxoc01.cern.ch
Subject: Re: The <PRE> tag 
Summary: Yeah, what Tim said.
In-reply-to: Your message of "Tue, 24 Nov 92 12:10:56 +0100."
             <9211241110.AA03386@www3.cern.ch> 
Date: Tue, 24 Nov 92 10:03:21 CST
From: Dan Connolly <connolly@pixel.convex.com>


>>  Date: Tue, 24 Nov 92 21:54:37 -1000
>>  From: Rik Harris <rik@daneel.rdt.monash.edu.au>
>
>>  I think the <PRE> tag is a great idea, too.  The problem with not
>>  having newlines significant is that it makes it difficult to do
>>  indenting, etc.  One of the reasons the <PRE> tag is nice is that you
>>  can take text (eg, manual entries) and not worry about formatting:
>
>I was sugesting that you should format the above like
>
>  OPTIONS<p>
><p>
>    -b   this option performs the blah command.  And if this line is<p>
>         reasonably long, I can demonstrate what I'm talking about.<p>
><p>  
>
>    -f   this option performs the foo command.  Another annoying prob-<p>
>         lem is hyphenation.<p>
>
>That is, you explicitly put in the line end, but all white space is significan
t  
>on the line.. It means that lines like
>
>	See also csh, cc, blah, fred and junk.
>
>which would have to be a SINGLE LINE
>See also  <a name=csh href=csh.html>csh</a>, <a name=cc href=cc.html>cc</a>, <a  
>name=blah2 href=http://sdf.adf.uasdf.edu/fred/doc/junk/blah.html>blah</a>, <a 
>name=fred href=fred.html>fred</a> and <a name=junk href=junk.html>csh</a>.
>
>instead could out as for example
>
>See also  
>
> <a name=csh href=csh.html>csh</a>, 
>
> <a name=cc href=cc.html>cc</a>, 
>
> <a name=blah2 href=http://sdf.adf.uasdf.edu/fred/doc/junk/blah.html>blah</a>,
> <a name=fred href=fred.html>fred</a> and <a name=junk href=junk.html>csh</a>.
><p>
>
>which is mailable.  If you look atthe NJIT manual pages HTML, there is a  
>mixture of significant line feeds and explicit <p> elements for blank lines:
>
>  OPTIONS
><p>
>    -b   this option performs the blah command.  And if this line is
>         reasonably long, I can demonstrate what I'm talking about.
><p>  
>
>    -f   this option performs the foo command.  Another annoying prob-
>         lem is hyphenation.
><p>
>
>I propose we settle for one or the other.  I wonder whether there is
>anything in SGML to suggest which.

In fact, there is. Well, not actally in SGML, but in the "application
conventions" that I have used to map SGML onto WWW.

All elements in HTML have either mixed content, RCDATA, or CDATA.
Mixed content is a mixture of <tags>, &entity; references,
and #PCDATA. RCDATA is just &entities; and data. CDATA is just data.

[SGML actually has a couple other content modes: ANY and
element content, but I didn't use those.]

CDATA is only used for the TITLE. RCDATA is used for XMP and LISTING
(entity references _are_ recognized in RCDATA sections, so you
can inlclude the _full_ end tag like this: &lt;/XMP>. But the
string </ followed by a letter _ends_ the section, whether the
letter starts the XMP tag or not.)

The convention is that in PCDATA sections, newlines serve only
to delimit words, whereas in RCDATA, newlines are significant.

We can't use RCDATA for the PRE or FIXED tag, cuz the <a> tag
won't be recognized in RCDATA. So I'd suggest you ignore
newlines inside the PRE element, and use <p> to delimit lines.
And since we're not using the exact semantics of PRE, I like
the idea of using the name FIXED in stead.  In SGML:

<!ELEMENT FIXED - - (#PCDATA|A|P)*>

The fact that the MidasWWW browser can support the semantics
of PRE is due to its non-standard parsing, where it treats
illegal tags as data, rather than ignoring them. SGML says
they'r not data, whatever they are, and the HTML doc in
the web says to ignore them.

I'm integrating my low-level SGML reading routines into
MidasWWW now, and with the author's consent, the non-standard
behaviour will soon go away. [The MidasWWW 1.0 browser doesn't
do &lt; or &amp; either -- that too will change.]

I've got it running, but there are a couple integration bugs I haven't
yet tracked down.

I've also got something of a validation suite for HTML, so that
implementors can easily see if they've gotten it right. And the
suite goes from easy to hard, so they can see how much of it
they got right, and if they don't want to fix it, they can at least
document how much it's broken.

Dan