HTML Feature Test Entities, P as a container vs. separator

"Daniel W. Connolly" <connolly@hal.com>

Mail folder: WWW Talk Apr 94-present
Next message: Ari Luotonen: "Re: Multiple objects in a single transaction: Making it more concrete"
Previous message: Bert Bos: "Re: iso 8859 or escape sequencies?"

Errors-To: listmaster@www0.cern.ch
Date: Mon, 11 Apr 1994 21:16:02 --100
Message-id: <9404111906.AA27361@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: HTML Feature Test Entities, P as a container vs. separator
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 4995


On the Fate of P:

I gather that the general opinion is that HTML document
structure should look like:

<HTML>
 <HEAD>
  <TITLE>t</TITLE>
  ...
 </HEAD>
 <BODY>
  <H1>head</H1>
  <P>
   p with
   <em>emphasis</em>
   in it
  </P>
  <UL>
   <LI>item 1</LI>
   <LI>item 2</LI>
  </UL>
 </BODY>
</HTML>

Unfortunately, the common way that this is coded is:

<TITLE>t</TITLE>
<H1>head</H1>
p with <em>emphasis</em> in it
<ul>
<li>item 1
<li>item 2
</ul>

The unfortunate part is that there's no DTD (well, none that I can find)
that will enable a conforming SGML parser to infer that structure from
that document. However, if folks are willing to put <P> tags at the
_beginning_ of every paragraph, it can be done.

My current solution is
	(1) Docs lacking <P> start tags are supported in a backwards	
	compatible mode of the DTD, ala:

<!DOCTYPE HTML [
	<!ENTITY % HTML.pSeparator "INCLUDE">
	<!ENTITY % html PUBLIC "-//connolly hal.com//DTD WWW HTML 1.8//EN">
	%html;
]>
<title>backwards compatiblem mode</title>
<H1>header</H1>
para 1
<p>
para 2

in this mode, the text of the paras are content of the BODY element,
and the P elements are empty, ala:

<HTML>
 <HEAD>
  <TITLE>back..</TITLE>
  ...
 </HEAD>
 <BODY>
  <H1>head</H1>
  para1
  <P>
  para2
 </BODY>
</HTML>


	(2) In the standard usage of the DTD, paragraphs are containers
	and require explicit start tags, ala:

<!DOCTYPE HTML "-//connolly hal.com//DTD WWW HTML 1.8//EN">
<title>backwards compatiblem mode</title>
<H1>header</H1>
<p>para 1
<p>para 2

The parser infers:

<HTML>
 <HEAD>
  <TITLE>back..</TITLE>
  ...
 </HEAD>
 <BODY>
  <H1>head</H1>
  <P>para1</P>
  <P>para2</P>
 </BODY>
</HTML>

Here are the current feature test macros:

<![ %HTML.Minimal [
        <!ENTITY % HTML.linkRelationships "IGNORE">
        <!ENTITY % HTML.linkMethods "IGNORE">
        <!ENTITY % HTML.linkRedundantInfo "IGNORE">
        <!ENTITY % HTML.forms "IGNORE">
        <!-- @@ nested lists -->
        <!-- @@ phrases -->
        ]]>
        
<![ %HTML.Obsolete [
        <!ENTITY % HTML.PLAINTEXT "INCLUDE">
        <!ENTITY % HTML.titleCDATA "INCLUDE">
        <!ENTITY % HTML.litCDATA "INCLUDE">
        <!ENTITY % HTML.NEXTID "INCLUDE">
        <!ENTITY % HTML.font-phrase "INCLUDE">
        <!ENTITY % HTML.anchorNameCDATA "INCLUDE">
        <!ENTITY % HTML.pSeparator "INCLUDE">
        ]]>

<!ENTITY % HTML.pSeparator "IGNORE"
        -- use P element as paragraph separator, rather that container.
        This means not all paragraphs need to start with a <P> tag.
        -->

<!ENTITY % HTML.linkRelationships "INCLUDE"
        -- Adding markup to links to show the relationship between
        ends of a link
        see http://info.cern.ch/hypertext/WWW/MarkUp/Relationships.html
        -->

<!ENTITY % HTML.linkMethods "INCLUDE"
        -- Adding markup to links to show the methods supported
        by the referent object
        see http://info.cern.ch/hypertext/WWW/MarkUp/Elements/A.html
        -->

<!ENTITY % HTML.linkRedundantInfo "INCLUDE"
        -- Adding markup to links to give redundant information
        like URN, content type, title...
        -->

<!ENTITY % HTML.anchorNameCDATA "IGNORE"
        -- Anchor names should be distinct. SGML parser can validate
        this if the NAME attribute of the A element is declared as ID.
        But that restricts the syntax of an anchor name to an SGML name,
        i.e. a letter followed by letters, numbers, periods and dashes,
        up to NAMELEN (34) characters long.
        -->

<!ENTITY % HTML.PLAINTEXT "IGNORE"
        -- Support for the <PLAINTEXT> tag as a sign of the
        end of th HTML data stream and the beginning of a stream
        of text/plain data
        -->
<!ENTITY % HTML.titleCDATA "IGNORE"
        -- Is the TITLE element #PCDATA, RCDATA, or CDATA content?
        On Mosaic, it's #PCDATA, but in the linemode browser,
        it's more like CDATA, but not quite.
        -->

<!ENTITY % HTML.NEXTID "IGNORE"
        -- Used by the NeXT implementation to keep track of the
        next anchor id to use
        -->

<!ENTITY % HTML.font-phrase "IGNORE"
        -- allow B, I, TT, U outside PRE,
        CITE, VAR, etc. inside PRE
        -->

<!ENTITY % HTML.litCDATA "IGNORE"
        -- treat XMP, LISTING as CDATA, as per linemodeWWW
        -->

<!ENTITY % HTML.forms "INCLUDE"
        -- Support for forms as per
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
        -->

If you're interested, see
	http://www.hal.com/%7Econnolly/drafts/html-design.html
for background etc., and
	http://www.hal.com/%7Econnolly/html-test/html.dtd
	http://www.hal.com/%7Econnolly/html-test/html.decl
	http://www.hal.com/%7Econnolly/html-test/ISOlat1.sgml
for the DTD itself.

Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   (512) 834-9962 x5010
<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html