Your views on changes to HTML+

Dave_Raggett <dsr@hplb.hpl.hp.com>
From: Dave_Raggett <dsr@hplb.hpl.hp.com>
Message-id: <9309101608.AA29522@manuel.hpl.hp.com>
Subject: Your views on changes to HTML+
To: www-talk@nxoc01.cern.ch
Date: Fri, 10 Sep 93 17:08:40 BST
Mailer: Elm [revision: 66.36.1.1]
Status: RO
I would really value your input on possible changes to HTML+ following the
www workshop and subsequent discussions on www-talk. I hope to produce a
second draft within a couple of weeks for comments. When we are happy with
the revised spec, I will then put it forward to the IETF for consideration as
an internet draft.

HTML+ is positioned as a simple format especially suited to the needs of the
World Wide Web, with a wider range of features than the earlier HTML format,
e.g. tables, forms, figures, equations, and improved support for dividing
large works into a number of smaller nodes.


Supporting large works, e.g. books
----------------------------------

The existing draft is too concise and needs worked out examples of how
books and other large works can be broken up into a number of separate
nodes. The features for implicit and explicit navigation links need
further explanation, and offer a way of making available vast quantities
of works for which the copyright has now expired, and which would be too
large as individual nodes. The general idea is to allow toolbars to show
navigation buttons dependent on the node in question. This would work well
even for scanned documents with each page as a separate node.

A node could be designated as the top level with the order of links in this
node implying the prev/next sequence for reading its contents. Alternatively,
a more general hierarchy can be defined analogous to printed books with a
table of contents (TOC) as just another node. In this case, the prev/next
sequence is defined independently of the TOC. The GROUP tag is used to define
hierarchical structures independently of node boundaries. (I would also like
to rename GROUP to SECT to make its role more obvious).


Tables
------

Minor changes only, <tbl> to be replaced by <table> and <tt> by <tblcap>.
<tb> to be dropped. The performance fears due to the need for a pre-pass
seem to be unwarranted, since this information can be cached and used for
resizing and scrolling.


Figures
-------

Due to lack of interest and to ensure simplicity, the <FIGT> element for
overlays is to be dropped. It could be reinstated in future versions of HTML+.
Some people at w^5 suggested that <IMG> be used to define the image content
for figures. I am against this as the IMG align attribute (top, middle,
bottom) is quite different from that of FIG (left, center, right). It also
avoids having to add alignment attributes to the EMBED tag.


Lists
-----

As currently drafted HTML+ doesn't allow DL lists within UL and OL lists.
Some people have complained that this is unnecessarily restrictive.

O'Reilly & Associates have suggested we give authors greater control over
lists perhaps inspired by the DocBook DTD. Thus for UL lists we could
support mark=bullet, dash, box, check or a URL/URN for an icon. A small set
of standard symbols would be useful. For ordered lists, we may want to
allow authors to select the style: Arabic (1,2, 3), UpperAlpha (A, B, C)
and LowerRoman (i, ii, iii) etc. Alternatively, authors should be allowed
to specify the sequence identifier as an attribute e.g. <LI ITEM="c)">

For unordered lists, a new attribute wrap=horiz or wrap=vert would allow
multicolumn lists to be wrapped appropriately. For the second case with N
items in the list, the current window size determines the number of columns C,
and each column then has the N/C items (as in a standard Unix ls command).
This mechanism requires a simple pre-pass to establish the column width and
total number of items. This information can be cached as for tables. 


Normal text
-----------

The line break tag <BR> is to be dropped in favour of a <L>....</L> element
which can take an ID as used in the TEI DTD. This gives greater flexibility
for identifying lines e.g. in old texts.

Perhaps the DTD should require authors to use a <P> element rather than
allowing naked %text;


Emphasis
--------

At the www workshop, some people complained that the EM element was
overloaded and should be split up. Perhaps we should move to something closer
to HTML with a small set of presentation specific tags plus a wider set of
logical tags, e.g.

    Presentation specific tags:

        <I>, <B>, <U>, <S> for italic, bold, underline and strike-thru

        These could be nested and optionally take a role attribute.

    Bibliographic tags:

        <author>, <cite>, <isbn>, ...

    Computer Documentation tags:

        <cmd>, <opt>, <kbd>, <var>, <dfn>, ...

    Miscellaneous tags:

        <sub>, <sub>, <footnote>, <margin>

The logical tags could be an open ended set with their rendering controlled
via associated style sheets. This approach means that browsers can simply
ignore such tags and render their contents as normal text, unless otherwise
directed. In general unknown tags should be ignored and their contents parsed
as if the tag wasn't present. This also applies to unknown attributes.


Changes to documents
--------------------

In addition to the change bar elements, there are two new elements for legal
documents:  <added> and <removed> rendered as strike-thru and italic text
respectively (along with optional colour cues).

Many thanks,

Dave Raggett