Re: HTML+ Comments

Steve Heaney <Steve.Heaney@delft.sgp.slb.com>
Date: Tue, 20 Jul 1993 18:48:27 +0200
From: Steve Heaney <Steve.Heaney@delft.sgp.slb.com>
Message-id: <199307201648.AA25294@mordred.delft.sgp.slb.com>
To: www-talk@nxoc01.cern.ch
Subject: Re: HTML+ Comments
Status: RO

All,

Firstly, many thanks to Dave Ragett (and others) for taking on what must be 
an unenviable task.  Inevitably HTML+ is expected to be everything to all 
people and resolving conflicting requirements to everyones satisfaction is 
never easy.

I have a few comments to add to those that have gone before about the HTML+ 
DTD.  Some from my own experiences with writing DTD's.  I also had a browse
of the OSF DTD while writing this.  The HTML+ version I looked at was dated 
24 June 1993.

(The examples may be a bit wobbly - I havn't actually parsed them)

1  Given the simplicity of the current HTML DTD, is it necessary to "ensure 
   that most existing documents conform to HTML+"?  The task of mapping from 
   HTML to HTML+ should be pretty much the same whether or not HTML is a 
   subset of HTML+.

2  The paragraph tag as a container.  Great.  The trouble of having it as 
   a separator is that it becomes a formatting tag (stick an empty line in 
   here) rather than semantic markup (the text between the start and end 
   tags constitute a paragraph).  It initially caused me some confusion, 
   given that different formatters treated it in a different manner.

   The same comment (as noted by Klaus Harbo) apply to other EMPTY elements 
   which logically should have content.

3  Semantic markup using attributes - hmmmm!

   I appreciate the logic behind this decision, but I can't help thinking 
   its a bit of a kluge.  It _does_ mean that anybody can invent their own 
   "tag" and have a conformant document, but does nothing to ensure that it 
   will be rendered by a given viewer.  There will still be the need to 
   agree on a common set of elements (now transformed to attributes) and we 
   are back pretty much back to where we were before.

   Please, lets have the current list of <emph> types as elements in their 
   own right.

   In the same vein, could I suggest that <p> and <quote> are elements 
   "without style" and a separate element <note> carries a style attribute
   taking one of margin, caution, error (and maybe reviewer).

4  Given 1 above, how about introducing sections as containers rather than 
   H[1-6] being paragraph type elements - i.e.

      <!element section -- ( title?, (%main;)* ) >

   Nested sections then imply the level without the need for explicit tags.
   Would this be more difficult for clients to parse?

5  Formatting hints.  Given the option between a client supporting a wider 
   range of semantic markup and being about to tweek the format of individual 
   elements I know where I would put my money :-).

   If they are to be included however, it would make some sense to bung 
   them all into one entity to be included as an attribute for any element 
   that may require special formatting.

      <!ENTITY % inline-format
      "font   CDATA           #IMPLIED
       size   CDATA           #IMPLIED
       weight (bold | italic) #IMPLIED" >

      <!ENTITY % para-format
      "justify (left | centre | right)  #IMPLIED" >

   Superscript and subscript should be elements in their own right.

6  I cannot recall the reasons why line break and hard space were requested.
   Given that they are needed (I have a hard time with line break):

   - there is a character entity in the "ISO 8879-1986//ENTITIES Numeric and 
     Special Graphic//.." representing "no break (required) space".  (This 
     entity set is the one defining lt, gt and amp).  This would be more 
     appropriate than the <sp> element.
   - line break should be added as a processing instruction <?line-brk>, 
     which is exactly what it is.

7  QUOTE is permitted in the DD element without it being declared anywhere.  
   Why doesn't sgmls complain ?

8  Tables.  Dave asks if complex data should be allowed in table fields.  
   In principle I see no reason why not, but there are other things I'd 
   rather see implemented before supporting this.

9  The mailto URL.  (I know that it is not part of the DTD).  Maybe it 
   could be included as an element <mail> rather than a URL.  I think it 
   would make more sense - it sits awkwardly as a URL.

     <!ELEMENT mail - - ( #CDATA ) >
     <ATTLIST  mail
               address   CDATA   #REQUIRED >

   where address contains the fully-qualified Internet address.  Other 
   header fields could be added as attributes or elements.

10 Embedded data.  Would it be possible to use the SGML NOTATION construct.  
   In this way, any SGML conforming renderer would be able to process it 
   given the capability. E.g.

   <!NOTATION PS       SYSTEM>
   <!NOTATION PDF      SYSTEM>
   <!ELEMENT  EMBED    - - CDATA>
   <!ATTLIST  EMBED
              id       ID        #IMPLIED 
              notation NOTATION (PS|PDF) #IMPLIED >

   (I don't know if the mime types would be valid as notation types - I 
   can't find any info on whether NOTATION takes a defined set of values).

11 Tables, figures, examples etc. should have a display container.  Or am I     
   missing the point and this is the purpose of <panel> or <fig>?

   I was thinking of something like:

   <!ELEMENT DISPLAY - - ( title?, (fig | eqn | example | tbl)*, caption?).

11 Comments, marked sections etc.
   Given the (relative) complexity of this DTD it is likely that many people 
   (myself included) will resort to using an SGML editor if given half a 
   chance.  It is important therefore to support or tolerate as much of the 
   standard as is reasonably achievable.  This should include processing 
   instructions, comments, marked sections etc.

In case you skip read - most of these comments above revolve around ensuring 
HTML+ conform to the spirit of SGML:

- markup should wherever possible describe content not format,
- attributes qualify an element, not define its type or content,
- wherever possible physical form should be derived from the markup,
- use what SGML provides,

and some are just nigly points which go to show that I'm a pedantic bugger.

Right, now I'm off to hide for a few days :-)

Steve.

------------------------------------------------------------------------
Steven Heaney

Schlumberger Geco-Prakla
Postbus 148
2600 AC Delft
The Netherlands

Internet: heaney@delft.sgp.slb.com
------------------------------------------------------------------------