Re: HTML+ Comments

Dave_Raggett <dsr@hplb.hpl.hp.com>

Mail folder: WWW Talk Jul-Oct 1993
Next message: Dave_Raggett: "Re: HTML+ Comments"
Previous message: Peter Lister, Cranfield Computer Centre: "Re: Adobe's PDF"

From: Dave_Raggett <dsr@hplb.hpl.hp.com>
Message-id: <9307211140.AA16750@manuel.hpl.hp.com>
Subject: Re: HTML+ Comments
To: Steve.Heaney@delft.sgp.slb.com
Date: Wed, 21 Jul 93 12:40:06 BST
Cc: www-talk@nxoc01.cern.ch
Mailer: Elm [revision: 66.36.1.1]
Status: RO

Many thanks Steve for your lengthy comments. Have you seen the draft RFC
for HTML+ ? This is available at:

        http://info.cern.ch/hypertext/WWW/MarkUp/htmlplus.ps

> 1  Given the simplicity of the current HTML DTD, is it necessary to "ensure 
>    that most existing documents conform to HTML+"?  The task of mapping from 
>   HTML to HTML+ should be pretty much the same whether or not HTML is a 
>    subset of HTML+.

I don't think its necessary. HTML+ browsers could easily include support for
the HTML tags that aren't present in HTML+ itself. This would make browsers
robust in the absence of the DOCTYPE element, even for documents which mix
the two formats.

> 2  The paragraph tag as a container.  Great.

I have specified it as a container, even though it doesn't (and mustn't) have
an end tag. The scope of <P> is delimited by any element which is not given
by the entity definition %text; e.g. headers, lists, tables, figures, ...

>   The same comment (as noted by Klaus Harbo) apply to other EMPTY elements 
>   which logically should have content.

Virtually all elements in HTML+ have content.

> 3  Semantic markup using attributes - hmmmm!

>    I appreciate the logic behind this decision, but I can't help thinking 
>    its a bit of a kluge.  It _does_ mean that anybody can invent their own 
>    "tag" and have a conformant document, but does nothing to ensure that it 
>    will be rendered by a given viewer.  There will still be the need to 
>    agree on a common set of elements (now transformed to attributes) and we 
>    are back pretty much back to where we were before.

You are right in saying that "anybody can invent their own 'tag' and have a
conformant document", but wrong in assuming that viewer's can't render it
effectively. There is always a valid default for rending emphasis and
paragraphs etc. Authors can also include rendering hints which are
deliberately independent of the role for this very purpose.

>   Please, lets have the current list of <emph> types as elements in their 
>   own right.

Doing this would lead us back to the pressure for continual extension as
people find the current set too restrictive for their needs. The role
mechanism is designed to avoid this trap.

> 4  Given 1 above, how about introducing sections as containers rather than 
>    H[1-6] being paragraph type elements - i.e.

>      <!element section -- ( title?, (%main;)* ) >

The GROUP tag was designed to cover this need.

>   Nested sections then imply the level without the need for explicit tags.
>   Would this be more difficult for clients to parse?

Deriving the level automatically is considered as being too expensive,
particularly when a document might span across a number of nodes (retrievable
chunks - i.e. files).

> 5  Formatting hints.  Given the option between a client supporting a wider 
>    range of semantic markup and being about to tweek the format of
>    individual elements I know where I would put my money :-).

But what about semantic markup needs yet to be identified? The HTML+ role
attribute is intended to allow the format to be kept simple while allowing
novel roles as yet unforeseen. Commonly accepted roles should be registered
to encourage standard useage.

>   If they are to be included however, it would make some sense to bung 
>   them all into one entity to be included as an attribute for any element 
>   that may require special formatting.

This was considered but didn't seem worthwhile.

>   Superscript and subscript should be elements in their own right.

Why make a special case for these?

> 6  I cannot recall the reasons why line break and hard space were requested. 

Line breaks often have semantic significance, e.g. in poetry or in quoting
parts of old manuscripts for which the rerences are to given line numbers.

Hard spaces are needed since there is no way for browsers to deduce which
spaces mustn't be broken.

>   - there is a character entity in the "ISO 8879-1986//ENTITIES Numeric and 
>     Special Graphic//.." representing "no break (required) space".  (This 
>     entity set is the one defining lt, gt and amp).  This would be more 
>     appropriate than the <sp> element.

That sounds good to me. As you can see in the RFC, I have used character
entities for em and en dashes. Do you know the appropriate entity names for
all of these?

>   - line break should be added as a processing instruction <?line-brk>, 
>     which is exactly what it is.

See previous explanation.

> 7  QUOTE is permitted in the DD element without it being declared anywhere. 
>    Why doesn't sgmls complain ?

My mistake! Thanks for spotting this. (I wished sgmls had complained)

> 8  Tables.  Dave asks if complex data should be allowed in table fields.  
>    In principle I see no reason why not, but there are other things I'd 
>    rather see implemented before supporting this.

I have recently extended tables to also allow <P>, headers and lists.

> 9  The mailto URL.  (I know that it is not part of the DTD).  Maybe it 
>    could be included as an element <mail> rather than a URL.  I think it 
>    would make more sense - it sits awkwardly as a URL.

The mailto URL is gaining favour and I see no good reason not to support it.
The FORM tag does indeed allow you to include RFC 822 headers, see the MH tag.

> 10 Embedded data.  Would it be possible to use the SGML NOTATION construct. 
>    In this way, any SGML conforming renderer would be able to process it 
>    given the capability.

>   (I don't know if the mime types would be valid as notation types - I 
>   can't find any info on whether NOTATION takes a defined set of values).

Mime content types can have attributes themselves. How would this fit with
the NOTATION mechanism?

> 11 Tables, figures, examples etc. should have a display container.  Or am I 
>    missing the point and this is the purpose of <panel> or <fig>?

FIG includes support for captions etc. PANEL behaves in much the same way as
you described.

> 11 Comments, marked sections etc.
>    Given the (relative) complexity of this DTD it is likely that many people
>    (myself included) will resort to using an SGML editor if given half a 
>    chance.  It is important therefore to support or tolerate as much of the 
>    standard as is reasonably achievable.  This should include processing 
>    instructions, comments, marked sections etc.

I expect most people will use public domain WYSIWYG editors designed for HTML+
itself. A half way house uses menus/toolbar to insert the markup, which
otherwise remains visible - the Emacs editor for HTML works this way.

Browsers should already support <!-- comments >.

> - markup should wherever possible describe content not format,

Agreed.

> - attributes qualify an element, not define its type or content,

Generally true, but look at the ISO DTD's. They show the results of
sticking to this policy - there are huge numbers of elements. Whereas
HTML+ is designed as a lightweight flexible format.

> - wherever possible physical form should be derived from the markup,

Yes, but don't forget user preferences.

Best wishes,

Dave Raggett