More comments on the HTML 3.0 draft

Bert Bos (bert@let.rug.nl)
Mon, 24 Apr 95 11:22:11 EDT

The train to Darmstadt offered me an opportunity to study the HTML 3.0
draft at leisure. I'll have to start with a big compliment to Dave
Ragget, both for the amount of work he has put into it and for the
result.

Nevertheless, here are some comments, followed by some comments on the
latest TABLE proposal that Dave sent to the list this weekend.

p.9 "Character sets", last par:

Unicode *encodings* (UTF-8, etc.) have no more relation to
HTML than, say, uuencode or zip. This par. can be left out.

p.11 "Attributes", last par but one:

Regarding the misuse of ">" to end a quoted attr. value: the
HTML standard is not the place to warn against broken
browsers.

Paragraphs like this occur several times throughout the draft.

p.16 "The HTML element":

The ROLE attr. should be removed. A doc. doesn't have a role
on its own, it can only have a role in relation to another
doc. Therefore only the source anchor of a link can specify
the role of the target doc.

p.17 "ISINDEX":

"The document can be queried..." what does this mean?

"If added by hand", this phrase should be removed. The
sentence is true without it.

p.18 "LINK":

I still don't see why we need both REL and REV. They have
exactly the same meaning.

p.22 "NEXTID":

"I want to get rid if NEXTID". Why don't you?

p.28 "NOWRAP":

In non-wrapping text, you may not only want a forced line
break (<BR>), but also an allowed line break. While were
waiting for Unicode, I propose we define an entity &sbsp;
(cf. Netscape's proposed <WBR> tag).

This remark also applies to p.30, 2nd par, and several other
places in the draft.

p.50 "The IMG (Image) Element":

Why is IMG not intended for embedding HTML? Are there any
other restrictions?

p.51 "WIDTH" & "HEIGHT":

The size is only "suggested", whereas in <FIG> (p.73) it is
the size into which the image will be forced. Why the
inconsistency?

p.78 "Tables", 5th par, 8th par, figure:

Conflict between the cell counting rules:

The 4th rule says: "If the column count for the table is
greater than the number of cells for a given row (after
including cells for spanned rows),..." The part between
parentheses conflicts with the 7th rule, that says that cells
can overlap. If you apply rule 4 to the example, the last row
will count four cells instead of three (and therefore has only
one empty cell).

The example exhibits another inconsistency:

Cell 6 is pushed to the right by cell 1, while cell 7 isn't
pushed to the right by cell 6. This suggests that the
possibility of invalid tables (overlapping cells) can be
completely removed by reformulating rule 7 in terms of this
"pushing to the right" effect.

More comments about tables are at the end of this message.

p.111 "Horizontal rules":

Is the CLASS attr. a *space* separated list or a *period*
separated list? The text switches from one to the other.

p.113 "Preformatted text":

"The <P> tag should be avoided", but the DTD says that is not
even allowed. Which is true?

Same for lists, FIG and TABLES.

p.115 "WIDTH":

"Can't we get rid of this..." Yes, I think we can.

p.118 "Foototes":

Footnotes have one indirection too many: first you have to
click on a word to jump to the place where the footnote is
stored, then you have to click again to open the footnote.

I like the TEI style of footnotes much better: notes are
placed in the text at the position where they belong. The word
or phrase to which a note belongs is implicit, but it may be
made explicit with a link from the note back to the phrase
(instead of the other way round.)

A browser renders the note as a button at the place of the
footnote, the user only has to click once.

p.144 "Carriage return":

CR, LF and CRLF are treated the same in almost all contexts,
but not in <PRE>. Do we really need the capability to
overwrite a line multiple times, like old line printers used
to do? I think not, and it would be nice if CR, LF and CRLF
would mean the same in all contexts.

p.150 "Numerical Character References":

Shouldn't &95; be a "low line" instead of a "horizontal bar"?

p.167 in DTD, "Headers...":

There is a subtle difference between <H1 SRC="..."> and
<H1><IMG SRC="...">, in terms of what a style sheet can do
with them. Should a note to explain this be added?

The SRC attr has parentheses that should bve removed. Same for
SRC attr. in UL (p.169) and LI (p.170).

p.169 in DTD, "style sheets control numbering style":

The CONTINUE attr of <OL> is difficult to implement, since
previous lists have already disappeared from the parser stack
when this attr. is encountered. But I guess that's not
sufficient argument to drop the attr.

p.170 in DTD, "BODY":

BODY lacks the %url.link attr.

p.171 in DTD, "BODYTEXT":

Why does this element exist at all, it can't even be omitted!

Same for FIGTEXT (p.177).

p.173 in DTD, "SELECT":

The inclusion exception is superfluous.

Same for TEXTAREA.

p.179 in DTD, "The inclusion of math...":

The awkward double use of B, SUP and SUB (awkward, because
they have completely different content models in a math
environment than in normal text) can be avoided by introducing
new elements: MATHB, MATHSUP, and MATHSUB.

p.183 "NOTATION w3c-style":

The public identifier is not well-formed. According to the
SGML decl. all public identifiers should be "formal".

Dave Raggett proposed a new syntax for the TABLE element, which more
closely resembles CALS tables. I've never seen a real CALS table, but
judging from the fragments that have passed this list I think this
resemblance is rather an argument *against* the new proposal.

Some people seem bent on introducing into HTML all the errors of the
CALS table model. What's next? Do we get NAMEST back as well? I've
stated my views on this before, but here are some additional notes on
the new proposal.

BORDER attr:

Why should the border width be given at all? The style sheet
can do that. Instead

BORDER NAMES "" -- any comb. of left, right, top, bottom --

seems much more useful. The BORDER applies to every TR, TH and
TD as well (it's "inherited"), but TR, TH and TD can also have
BORDER attrs. of their own.

WIDTH attr:

A default unit of pixels seems the wrong choice. See Jon
Bosak's message "Widths in tables"
<9504231936.AA05292@aristotle.sjf.novell.com.SJF.Novell.COM>
earlier on this list.

CELLSPACING, CELLPADDING attr:

This a matter for the style sheet. Besides, why do we need
them both?

TBODY:

This element is redundant.

TSPEC:

The TSPEC element contains information about the way the
elements in another branch of the SGML tree should be
formatted. To summarize my earlier objections: there is no
relation between the TSPEC and the elements to which it
refers (the parser will already have thrown away the TSPECS
when it comes to the table cells), and TSPECs are unnecesarily
verbose.

The TSPECs introduce another nasty "feature", similar to the
NAMEST attr. of the former COLSPEC element: the possibility to
explicitly attach a TSPEC to one or more columns and rows
introduces countless opportunities for mistakes and
ambiguities. It makes it harder to parse the table and it does
nothing whatsoever to make it easier for the writer.

(For example: does TSPEC apply to TFOOT and THEAD as well?
what to do with <TSPEC WIDTH="4em"><TSPEC COL=1 WIDTH="5em">,
will it be 4 or 5 em?)

TD, TH:

Cells can override WIDTH, HEIGHT, ALIGN, VALIGN, NOWRAP, CHAR,
but why can't they override BORDER as well? (or CELLSPACING,
CELLPADDING?)

-- 
                          Bert Bos                      Alfa-informatica
                 <bert@let.rug.nl>           Rijksuniversiteit Groningen
    <http://www.let.rug.nl/~bert/>     Postbus 716, NL-9700 AS GRONINGEN