Comments on HTML+ discussion document (long)

Bert Bos <>
Message-id: <>
From: Bert Bos <>
Subject: Comments on HTML+ discussion document (long)
To: (* WWW discussion list )
Date: Mon, 1 Nov 1993 15:54:05 +0100 (MET)
X-Mailer: ELM [version 2.4 PL13]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 12202     
I've taken the weekend to delve into the new HTML+ specification. Here
are my comments.

The HTML+ draft is a good example of a balance between a vision of the
future and a realistic, implementable plan. Compared to HyTime -- to
which it has become more similar with this generation -- it is clear
that the former wants to cover everything that may become possible in
the future, while HTML+ goes no further than the technology of next
year (which is already impressive enough!)

When designing this year's HTML, one should nevertheless keep an open
eye for the things that may be added in the years after. Keeping the
door open for (future) HyTime compatibility seems a healthy approach
to me. A few of the comments below refer to HyTime in this manner.

3. Headers

ad "nestable sections"

    Keeping explicit, context-independent header-levels will make the
    browser simpler. But we can express the structure of a document
    with sections as well, by assuming an (omittable) element
    enclosing every header and the subsequent text:

	<!ELEMENT SECTION1 O O (H1, %bodytext ?, SECTION2*)>
	<!ELEMENT SECTION2 O O (H2, %bodytext ?, SECTION3*)>

    This will make it illegal to skip a level, which is essential if
    some browser or printer driver wants to number the headings. It
    also allows a link to be made to an entire SECTION, instead of to
    header only.

4. Paragraphs

ad "HTML+ formally doesn't require you to wrap text up as paragraphs"

    This may be true conceptually, but the to the software this is
    less easy. It would be better to require all text to be wrapped up
    as something. In other words, if untagged text follows a header, a
    P tag is assumed:

	<!ELEMENT P O O (L | %text)+>
	<!ENTITY % bodytext "(%block | %lists | %paras)>
	<!ELEMENT SECTION1 O O (H1, %bodytext ?, SECTION2)>

    When PCDATA is encountered after a H1, H@, etc, a P tag is
    automatically inserted. NB. to make this acceptable to a
    validating SGML parser, some trickery with SHORTREFs is needed, in
    order to skip unwanted blank space, but that can be done (I tried)
    and it doesn't affect the browser.

5.2. Hypertext links

ad "HREF"

    Why not take one further step and make HTML+ HyTime compliant? It
    involves adding one more element to the DTD and a number of
    attributes that will not show up, since they all get default

	   -- Anchor attributes --
	   id ID #IMPLIED
	   ... etc.
	   -- Extra for HyTime --
	   ref ID #REQUIRED			-- link to NOTLOC element --
	   -- Required for HyTime --
	   HyTime NAME #FIXED "clink"
	   HyNames CDATA #FIXED "target linkends"
	<!ELEMENT notloc - - CDATA>
	<!ATTLIST notloc
	    id IDREF #REQUIRED			-- link to A element --
	    notation NOTATION #FIXED "WWW"

    (To make this complete, there should also be a declaration
    <!NOTATION WWW... somewhere in the DTD.) It doesn't matter where
    in the document the NOTLOC element is inserted, it could be inside
    the A element, at the top or end of the document, as long as there
    is a NOTLOC for every A. Using any authoring tool (e.g., the
    html-mode for Emacs) generating the NOTLOC and the ID to bind
    NOTLOC and A together should be automatic.

    In fact HTML+ already works with this indirection partially, in
    the LINK idref attribute.

ad "TYPE", "SIZE", and "METHODS"

    It is already noted in the text, but it should also be stressed in
    the user interface of any browser that uses these attributes:
    don't trust these attributes!

5.6. Logical emphasis

ad "Q"

    The browser should insert quotation marks, such  as `to be' or "to
    be", or whatever style of quoting is preferred.

ad "CITE"

    The browser should display this as (Festinger...) or [Festinger],
    or whatever style is preferred. CITE is meant for use in running
    text, not in a bibliography.


    A browser might use small caps instead of the full-size caps.

5.7. Extending the set of logical roles

ad "isn't meant to apply retroactively"

    [Great idea, this RENDER element!] The best place for RENDER
    elements is therefore at the top of the document. It is an empty
    element, there is no </RENDER>.

    The comma-separated list of styles is probably better changed to a
    blank-separated list, as is customary in SGML, I believe.

	    tag CDATA #REQUIRED		-- Why was this #IMPLIED? --
	    style CDATA #IMPLIED

5.9. Images

ad "text flowing around the image"

    While this may look nice for an image at the start of a a
    paragraph, it isn't so nice for images anywhere else. It is also
    difficult to implement. Better not require this. Instead, require
    that an image *never* overlaps with text.

ad "IMAGE"

    The footnote that recommends the IMAGE element over IMG should be
    promoted to a normal sentence. (And why not drop the ALT
    attribute of IMG altogether?)


    This is a nice feature, that can make displays much more
    attractive, but it will always be dependent on the format of the
    image. For XPM, no such attribute is needed; for 256-color images
    it can be an RGB or HSV value in X format; for true-color images
    it has to be a color range or approximate color.

ad "multipart/mixed"

    How can the browser recognize which part of the multipart message
    corresponds to a given URL? (But maybe this paragraph should be
    moved to the HTTP definition anyway.)

5.11. Conditional text

    The normal SGML method would be to use `marked sections:'

	... text before the marked section...
	<![ %online [ ... text that only appears when on-line... ]]>
	... more general text...
	<![ %printer [ ... text that only appears on the printer... ]]>

    %online and %printer are entities, that have the values:

	<!ENTITY % online "INCLUDE">
	<!ENTITY % printer "IGNORE">

    for the browser, and the other way round for the printer.

6.1. Longer quotations

ad footnote 1 "quote by name"

    This is certainly useful. It allows one to automatically show the
    latest version of something, without having to change the document
    itself (cf. Windows DDE). It works for IMAGEs, so why not for
    text? But it should not be a function of the QUOTE element. Better
    to define TXT and TEXT (analogous to IMG and IMAGE).

    Maybe we want to quote not a complete document, but only a certain
    element, identified with an ID attribute. This might yield a P or
    a TABLE, or L, etc.

6.4. Notes and admonishments

ad "ROLE attribute"

    In the absence of a SRC attribute (And I strongly recommend
    writers to omit it for all but the exceptional types of notes),
    the ROLE attribute should determine the rendering of the note and
    the note icon. The value of the ROLE attribute should therefore
    not be printed (it is not a word, but a type). The following list
    of predefined ROLEs should be recognized by browsers:

	note	- (no icon)
	warning	- exclamation mark, or triangular traffic sign
	error	- stop traffic sign
	info	- circled "i"
	tip	- index finger pointing up

7.3. Plain lists

    Plain lists are sufficiently different from bulleted lists to
    warrant an element of their own. I would suggest dropping the
    PLAIN attribute and only use DIR instead.

9. Tables

    The TB element has been omitted from the description. Also, it is
    used but not defined in the DTD.

11. Literal text

ad "TAB"

    Instead of the width of the capital M, use the "em". When the font
    has no em defined, the width of the M or something similar could
    be used instead.


    Official versions of HTML+ should be mentioned in the SGML
    declaration, but the attributes of the HTMLPLUS tag could be used
    to notify the browser of extra requirements or hints, that do not
    affect the DTD. FORMS=off is such a requirement: a browser must
    comply. An example of a hint could be LANG=NL, telling the browser
    to apply Dutch formatting conventions as much as possible. (It
    becomes the default for all other LANG attributes.)


ad "the search field always visible"

    This is mentioned in 2.2, but it might be stressed here again.
    This is what makes ISINDEX different from INPUT. A good example of
    the use of ISINDEX is as a sort of command line. Maybe ISINDEX
    should therefore be called something different, like ISINPUT or

14.7. Links

ad "UseIndex"

    The UseIndex attribute implies that there is an index and gives
    its URL, but does it also mean that the current document is
    "searchable"? Maybe the browser should show a different prompt
    from the one used for ISINDEX.

15. Large documents

ad "implicit links"

    The table of contents concept is an instance of a more abstract
    concept, that of independent links. What is described here is
    essentially a method for adding hyperlinks to documents that don't
    have them. So why not make it more general. Example:

	<ILINK from="http://mach.ine/doc1#id1"
	       to="http://mach.ine/doc2#id2" -- hyperlink between elements -->
	<ILINK from="http://mach.ine/doc1"
	       role="next" -- hyperlink between documents -->

    (Or better yet, use the indirection of HyTime.)

ad "WWW-link"

    This is similar in concept to the REL=subdocument idea, but it
    works completely differently. It should be in a numbered section
    of its own.

Appendix I

    HTMLPLUS could be defined as just


    So many elements have the three attributes ID, LANG and INDEX,
    that it might be clearer to put them in an entity.

ad "OL"

    Why isn't a list defined as:

	<!ELEMENT (OL|UL) - - (LI*)>
	<!ELEMENT LI - O (%block|%lists|%paras)*>

    i.e., a list consists of nothing but list items, but a list item may
    contain more than just text.

ad "A"

    The INDEX attribute is missing? Why is SIZES specified as NAMES,
    when only numbers are allowed?

ad "character entities"

    The list of character entities should be referred to by name:

	<!ENTITY %Latin1 PUBLIC ...>

    This allows an SGML application to substitute a different file,
    e.g., one that maps entities to LaTeX macros.

Appendix III

    There should have been some documentation on how this code is

ad "for (i = 0; pgon[i][X]..."

    Typically a C programmer! First use an index and only then check
    if it is valid to do so. No Pascal programmer would do it like
    this. Better to write:

	pgon[MAXVERTS-1][X] = -1;    /* Ensure termination */
	for (numverts = 0; pgon[numverts][X] != -1; numverts++) ;

    Not only is it safer, it is also slightly faster.

ad "p = (double*) pgon + 1"

    Please don't use this style in example code! Replace this by p = 0
    and replace every use of p by pgon[p][Y], etc.

Miscellaneous comments

At the moment, there is only one annotation server for the whole of
WWW. Clearly, this is not a long term solution. The load should be
distributed. I can see two solutions:

1) an algorithm in the browser computes the annotation server to
contact given a URL (a hashing functions or the `nearest domain'.)

2) every document specifies its own annotation server, in a LINK

    <LINK role=Annotations href="">

                    / _   Bert Bos <>  |
           ()       |/ \  Alfa-informatica,           |
            \       |\_/  Rijksuniversiteit Groningen |
             \_____/|     Postbus 716                 |
                    |     9700 AS GRONINGEN           |
                    |     Nederland                   |