HTML spec. questions

"James (Eric) Tilton" <jtilton@jupiter.willamette.edu>
Date: Sat, 8 Jan 1994 16:12:25 -0800 (PST)
From: "James (Eric) Tilton" <jtilton@jupiter.willamette.edu>
Subject: HTML spec. questions
To: www-talk@www0.cern.ch
Message-id: <Pine.3.88.9401081523.A6844-0100000@jupiter>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Content-Length: 2573
Following my comments on comp.infosystems.www about device-indpendent 
HTML and the like, a number of people have suggested that what's needed 
is a tool for the checking of HTML code for inconsistencies and bad 
practices.  To that end, I'm starting to work on "lint for the web" sort 
of program.

To that end, I've been reading through the specification for HTML at 
CERN, and have some questions:

  * the comment is made in
    http://info.cern.ch/hypertext/WWW/MarkUp/Text.html that "neither spaces
    nor tabs should be used to make SGML source layout more attractive to
    read".  This is understandable in the case of tabs, since their
    behaviour is undefined.  But why shouldn't somebody use spaces to make
    their HTML source more readable?  I thought the specification called
    for spaces to be collapsed into a single space?  Or do we not get to
    make that assumption?  I'd like to be able to format my HTML like:

<ul>
<li> this is my unordered list.  I realize that this first entry is awfully
     long, and I'd like to have spaces to indent it in the source in order
     to make it readable to me, as an author.
  <ul>
  <li> and if I nest lists, I'd like to be able to indent, so I don't
       lost track of things.
  </ul>
</ul>

    Is this sort of thing acceptable?  Shouldn't it be?

  * It's not explicitly declared in the specification whether a <HR>
    implies a paragraph break.  I'm assuming it does, but I'd like
    confirmation :).

  * On that note, I'm also under the impression that if an element implies
    a paragraph break (such as the ADDRESS element), then a <P> should
    neither be place immediately before OR after it.  Is this correct?

  * Does PRE imply a paragraph break?

I'm not sure yet what the scope of this program will be.  At the minimum, 
it will do things like point out incorrect usages of <P>, and other 
things which are pointed out as not recommended by the specification.  
That is, things that will be parsed successfully, but aren't really 
device independent.  I'm not sure whether or not it should also check to 
see whether the HTML is just plain illegal -- is this neccessary or even 
desired functionality?  (And do I want to go to the extra effort?  :) )

Any comments appreciated!

						-et

/ (James) Eric Tilton, Student AND Student Liaison, WITS               \
\ Class of '95 - CS/Hist  -- Internet - jtilton@willamette.edu         /
<a href="http://www.willamette.edu/~jtilton/">ObHyPlan!</a>, chock fulla
<a href="http://www.willamette.edu/~jtilton/whatsnew.html">Fun Stuff!</a>