HTML tables - trying to reach consensus

Bert Bos (bert@let.rug.nl)
Wed, 3 May 95 08:52:34 EDT

I assume we still want to standardize on a table model for HTML
2.1. However, it seems to be difficult to reach consensus. I've tried
to make a list of the issues, arguments, and proposed solutions, so
that maybe we can just check off the points that we agree on. That,
combined with a compromise here and there should give as enough for
the first version of the table DTD, and if we do it well, leaves room
for compatible enhancements in the future.

I suggest we (i.e., everybody who has an opinion on these matters)
take a few days to read and cogitate, and maybe ask for explanations
of points that are unclear. After that, let's hear what each of us
thinks essential, acceptable, or unacceptable. After a few rounds of
give and take, who knows: a usable model for HTML 2.1 might emerge.

Please read carefully!

* A. GOALS

First the goals. Apart from the fact many people would rank them
slightly differently, there shouldn't be much discussion about them;
we simply have to take them all into account.

1. Primarily presentation oriented (i.e., rows and columns, rather
than abstract dimensions and coordinates along those dimensions).

2. Easy to understand (by a human).

3. Compact (in order to facilitate typing by hand).

Ways to do this include: making attribute names and values
short, chosing useful defaults, and allowing abbreviations for
repeated parts.

4. Easy to parse (by a Web user agent).

This could mean many things, for example: essential information
must be available before it is needed; information on the stack
is easier to find than info in a tree; attributes with the same
function have the same name (and conversely, different functions
have different names); there are no exceptions, or: what applies
in one context, applies in others as well.

5. Easy to format (by a style sheet based real-time formatter).

For example: one-pass formatting is easiest; the formatter
shouldn't need to look into cells, the cell's overall size
should be sufficient; memory use (for counters, buffers, etc.)
should be limited; etc.

6. Easy to implement in a syntax-directed editor.

Not only syntax-directed, but also WYSIWYG would be nice. See
also the next item.

7. Easy to generate from other sources: spreadsheet, database,
wordprocessor, other SGML table formats (CALS), LaTeX tables,
Troff (tbl) tables, etc.

8. Provisions for adding enough information to translate the set of
cells to an abstract set of points in n-dimensional space (the
AXIS/AXES attributes).

9. Error resistant, i.e., a syntax that makes it hard to create
incorrect tables, that reduces (or eliminates!) the chance of
perceived ambiguities or invalid tables.

10. Compact (in order to reduce file size).

11. Enough hooks for graphic artists to create beautiful designs,
using their favourite style sheet language.

For example: 3D borders, GIF images as borders, precise cell
spacing, background colours, overall blackness control, corner
pieces to join vertical and horizontal lines, vertical and
diagonal headers, diagonal lines, etc.

12. Minimum requirements that mark the line between what's needed to
make the contents readable and what makes them nice to look
at. "Readable" means that the reader is unlikely to misinterpret
the data.

The absolute minimum is that a parser handles the TABLE,
CAPTION, TR, TD and TH cells, and ignores all attributes. If
that's not enough for readability, what attributes are needed?

* B. FEATURES

In principle, if the mark up includes enough hooks (in the form of
CLASS attributes or otherwise), any lay out can be specified with
style sheets. However, it may be desirable to include some formatting
in the mark up, either for ease of use, or because some of it is
arguably part of the semantics of the table (see item 12 above).

Here is a list of the features that have been mentioned as belonging
in the mark up. Note that this is just a list of features, how they
appear in the syntax is not of concern here.

The idea is to check off subsets of these: one for HTML 2.1, another
with what we expect to add later, and the rest that we can leave to
style sheets or other applications. Note that an empty set is also a
set!

** 1. Column widths

a. Relative column widths (i.e., relative to other columns, not to
anything outside the table).

b. Table width relative to page width (as a percentage of the space
between the current margins, or something similar).

c. Absolute table width, in real world units:
I. cm; II. mm; III. dm; IV. inch; V. point; VI. pica;

d. Fixed table width, in units that derived from something
(virtually) inside the table. The units will be something like
the body size or em of a designated default font.

e. Device dependent width in pixels (but what is a pixel if the
output is a laser printer?)

e. Automatic column and table width, using some algorithm that
finds a compromise between the columns' minimum and maximum
widths and the current page width.

f. A mixture of the above, in the sense that some columns are given
an absolute or fixed width, while the rest is sized
automatically.

g. Allow fractional numbers (1.5) and/or scientific notation (2.3E-2)
in any of the above.

** 2. Vertical alignment of a cell within a row

a. top

b. middle

c. bottom

d. baseline: the baseline of the first line in the cell is aligned
with the baselines of the first lines of all other cells in the
same row. Note that this depends on other cells in the row and
is therefore ambiguous if some cells in the row have a different
alignment.

e. some offset from the top (percentage or otherwise).

** 3. Horizontal alignment of a cell within a column

a. Assume that this really means: alignment of the text within the
cell. Note that this can be no more than a *default* alignment,
it is still possible to put a left aligned paragraph inside a
cell; the paragraph as a whole will be right aligned inside the
cell, but each of the lines will be left aligned inside the
paragraph.

b. left

c. center

d. right

e. decimal, on the decimal separator of the current language. Note
(1) that this means that the formatter needs to know more about
a cell than just the width; and (2) that this assumes the cell
contains text, not other elements.

f. align on an arbitrary character

g. (e) or (f) combined with an offset from the left

h. some offset from the left

i. flag to (dis)allow line breaking inside the cell (see the note
in (a) above).

** 4. Horizontal rules

a. just an on/off switch (on = rules on the top and after every
row, off = no rules at all)

b. typed rules (none, single, thick, double, 3D raised, 3D sunken,
stippled, etc.)

c. thickness of rule in pixels

d. thickness in absolute units (point, mm, etc)

e. thickness of rule in fixed units (font size)

f. parameters for other aspects of a rule (gap between double rule,
width of 3D bevel, etc.)

g. rules tied to special positions: top of table, bottom of table,
between header and body, between body and footer, between all
other rows.

h. rules below individual rows (either by numbering the rows, or by
inserting a rule at the appropriate place (an <HR> element or an
attribute).

** 5. Inter-row spacing

a. compact/loose flag (analogous to the COMPACT attribute on lists)

b. absolute spacing (real world units, such as cm, mm, inch, point)

c. fixed spacing (relative to some default font)

d. (a), (b) or (c) tied to special positions (top, bottom, between
head and body, between body and foot, everywhere else)

e. (a), (b) or (c) tied to individual rows (either by row number or
with an attribute of the row)

f. Mark rows as possible places for a page break (for output on
paper).

** 6. Column and row spanning, nested tables

a. span columns

b. span rows

c. allow nested tables

d. non-rectangular cells

** 7. Overlapping cells (only possible if 6b or 6d is checked)

a. formulate the rules in such a way that cells cannot overlap (the
simplest is to say that cells automatically move to the right
until they fit).

b. allow overlapping cells and declare them to be meaningful.

c. allow overlapping cells in the syntax, but declare them
(semantically) invalid.

** 8. Vertical rules

a. just an on/off switch

b. typed rules (none, single, thick, double, 3D, stippled, etc.)

c. thickness in pixels

d. thickness in absolute units (point, mm, etc.)

e. thickness of rule in fixed units (font size)

f. parameters for other aspects of a rule (gap between double rule,
width of 3D bevel, etc.)

g. rules tied to special positions: left of table, right of table,
between columns.

h. rules after individual columns.

** 9. Inter-column spacing

a. compact/loose flag (analogous to the COMPACT attribute on lists)

b. absolute spacing (real world units, such as cm, mm, inch, point)

c. fixed spacing (relative to some default font)

d. (a), (b) or (c) tied to individual columns

e. inter-column spacing takes priority over column width (see
section 1 above, not always possible)

f. a conflict between inter-column spacing and column width renders
the table invalid (and the result undefined).

** 10. Abstract model

a. attributes to allow translation from the 2D lay out to an
abstract n-dimensional representation (and from there to, e.g.,
braille or speech). This means that cells can have one of two
roles: they either mark a point on an axis or attach a value to
an n-tuple (a point in an n-dimensional space).

b. allow a cell to fulfill both roles at the same time.

c. add rules to make a default translation possible (that is, if no
explicit dimensions are given, allow the table to be interpreted
as a set of values attached to points in 2 dimensions).

d. rules for the syntax of these attributes. An example of such a
syntax could be: the AXIS attribute is of the form
DIMENSION=VALUE and the AXES attribute is of the form "DIM1=VAL1
DIM2=VAL2...".

* C. DEFAULTS

For each of the above features, we may want to specify a default. How
many defaults are needed depends on the syntax, of course. We may have
to do this section again once the syntax is fixed.

a. TD cells are left aligned

b. TD cells are left aligned if the language at the start of the
table is written left to right.

c. TH cells are centered

d. TH cells are left aligned

e. TH cells are left aligned if the language at the start of the
table is written left to right.

f. all columns have the same width

g. a table is as wide as the current line length

h. no borders

i. cells are aligned at the top

j. cells are centered vertically

k. if units are needed, the default is point

* D. SYNTAX

Some parts of the syntax are not controversial or not important enough
to fight over. Among those are the names of elements and attributes
(but see A4). Also in this group is the fact that a TABLE contains
only a single table, and the fact that a TFOOT (if present) comes at
the end.

** 0. Basic structure

a. No table headers and footers: <!ELEMENT TABLE - - (CAPTION?, TR+)>

"*" instead of "+" is also acceptable

b. With headers and footers:
<!ELEMENT TABLE - - (CAPTION?, THEAD?, TBODY, TFOOT?)>

** 1. Column widths

a. A #required COLWIDTH attribute on TABLE, containing
numbers, as many as there will be columns.

b. An #implied COLWIDTH attribute on TABLE, containing numbers, not
necessarily as many as there are columns.

c. A UNITS attribute on TABLE, to qualify the numbers in COLWIDTH

d. An #implied COLWIDTH attribute that contains both numbers and
units. (Note that this may introduce ambiguities: what happens
if units and dimensionless widths are used together?)

I. ditto, with abbreviations, e.g., "2 2 & 2 1" = repeat the
pair "2 1" as often as needed (cf. TeX); or "2 2 15*1" =
repeat the "1" fifteen times.

e. COLWIDTH and/or UNITS attributes not on the TABLE, but on empty
TSPEC elements inserted between CAPTION and THEAD, one TSPEC for
every column.

** 2. Vertical alignment

a. A single VALIGN attribute on the TABLE.

b. (in the case of 2) A VALIGN attribute on each of THEAD, TBODY
and TFOOT.

c. A VALIGN on each TR.

d. For each of a-c: allow abbreviations (as for column widths, see
above)

e. A VALIGN on each TD and TH.

f. A combination of a, b, c and e, with later ones having higher
priority.

g. VALIGNs on TSPEC elements, that refer to a cell by virtue of
having the same CLASS attribute.

h. VALIGNs on TSPEC elements, that also have an attribute to refer
to a row by number, or to several rows, by means of a list or a
range of numbers.

** 3. Horizontal alignment

a. an HALIGN attribute on TABLE, with a (one-letter) alignment
specification for each column (defaults apply if there are too
few specifications; what happens if there are too many?)

b. an HALIGN attribute on each of THEAD, TBODY and TFOOT.

c. an HALIGN on each TR.

d. an HALIGN on each TD or TH.

e. a combination of a-d, with later ones having higher priority
(Note: if a later HALIGN has fewer columns than an earlier one,
the remaining columns get the default alignment, not the
alignment specified by the earlier HALIGN).

f. HALIGNs on TSPEC elements, that apply to cells with the same
CLASS as the TSPEC.

g. HALIGNs on TSPECs, with as many TSPEC as there are columns.

h. Ditto, but with TSPECs referring to columns by number, list of
numbers and/or number range.

i. a CHAR attribute on TSPECs, to indicate which character to align
on.

j. a CHAR attribute on each TD and TH, to indicate which character
to align on.

k. both (i) and (j), with later ones having higher priority.

** 4. Horizontal rules

a. a single, boolean HRULES (or BORDER) attribute on TABLE.

b. an attribute that specifies the rules between rows in some
syntax (e.g., "top bottom" for rules only above and below the
table, single-letter values are possible as well).

c. (a) or (b), but with keyword values, such as "single", "double",
"thick".

d. (a) or (b), but with numerical values
I. dimensionless, but higher numbers mean thicker lines
II. with units

e. a boolean attribute HRULE on each TR, to insert a rule below the
row (and an attribute on TABLE to insert a rule above the first
row).

f. an HRULE attribute on each TR, but with keyword values.

g. ditto, but with a numerical value
I. dimensionless, but higher numbers mean thicker lines
II. with units

h. an empty element HRULE (or HR?) between rows (possibly with an
attribute)

** 5. Inter-row spacing

a. A single, boolean COMPACT attribute on TABLE

b. A numerical attribute (ROWSEP?) on TABLE (with units).

c. A numerical attribute on each of THEAD, TBODY and TFOOT.

d. A numerical attribute on each TR.

e. (b), (c) and (d), with later ones having higher priority.

f. an attribute on TSPECs, which refer to rows by virtue of having
the same CLASS as a TR element

g. an attribute on TSPECs, that refer to a row by a number, a list
of numbers, and/or a range of numbers

** 6. Column and row spanning, nested tables

a. a COLSPAN attribute on each TD and TH

b. a ROWSPAN attribute on each TD and TH

c. allow TABLE in the content model of TD and TH

** 7. (intentionally left blank)

** 8. Vertical rules

a. a single, boolean VRULES attribute on TABLE (can be combined
with 4a, and change its name to BORDER).

b. some syntax to specify left and right edge independently from
internal rules.

c. (a) or (b) with keywords ("s", "single", "d", "double", etc.)

d. (a) or (b) with numbers
I. dimensionless, higher numbers mean thicker lines
II. with units

e. A combination with COLWIDTH, mix numbers and letters (or numbers
and vertical bars, as in TeX)

f. An extra attribute on the TSPECs as meant in 1e above.

** 9. Inter-column spacing

a. A single, boolean COMPACT attribute on TABLE (combined with 5a
above)

b. A numerical attribute (COLSEP?) on TABLE (with units).

f. an attribute on the TSPECs as meant in 1e above.

g. an attribute on TSPECs, that refer to a column by a number, a list
of numbers, and/or a range of numbers

h. combine with inter-row spacing into a single attribute for both
horizontal and vertical spacing between cells (CELLSPACING?)

** 10. Abstract model

a. AXIS (CDATA) attribute on each TH

b. AXES (CDATA) attribute on each TD

c. both AXIS and AXES on TD and TH

Before anybody starts complaining:

There is no room for Netscape's use of BORDER, CELLPADDING and
CELLSPACING in any of the above. That's because those attributes
assume a 3D-look, which an HTML standard really cannot require of any
implementation.

Of course, there are many more possibilities, but let's keep the first
version simple. Additional features can be added later: HyTime- or
TEI-like addressing, conditional alignment specifications, conditional
column width specifications, etc.

Since the idea to connect TSPECs and cells by their CLASS attribute is
new (invented by Dave R only yesterday), I've added a small example of
its use. Here is a table of contents for an issue of Scientific
American:

48 +--------+ MIND AND BRAIN
|\ /| Gerald D. Fischbach
| \ / |
| \ / |
| \/ |
| /\ |
| / \ | The human...
| / \ | and experience...
|/ \| lions of years...
+--------+ the marvelous...

60 +--------+ THE DEVELOPING BRAIN
|\ /| Carla J. Shatz
| \ / |
| \ / |
| \/ |
| /\ |
| / \ | Remarkably precise...
| / \ | for all the properties...
|/ \| proximation of...
+--------+ tion of the...

<TABLE>
<TSPEC CLASS=PAGENR VALIGN=TOP HALIGN=RIGHT>
<TSPEC CLASS=ILLUS VALIGN=TOP HALIGN=LEFT>
<TSPEC CLASS=TITLE VALIGN=TOP HALIGN=LEFT>
<TSPEC CLASS=SUMMARY VALIGN=BOTTOM HALIGN=LEFT>
<TBODY>
<!-- first item -->
<TR>
<TD CLASS=PAGENR ROWSPAN=2>48
<TD CLASS=ILLUS ROWSPAN=2><IMG SRC="mind.gif">
<TD CLASS=TITLE><H2>Mind and brain</H2><P><EM>Gerald D. Fischbach</EM>
<TR>
<!-- Note that this cell ends up in column 3 -->
<TD CLASS=SUMMARY>The human brain is the most complex...

<!-- second item -->
<TR>
<TD CLASS=PAGENR ROWSPAN=2>60
<TD CLASS=ILLUS ROWSPAN=2><IMG SRC="devel.gif">
<TD CLASS=TITLE><H2>The developing brain</H2><P><EM>Carla J. Shatz</EM>
<TR>
<TD CLASS=SUMMARY>Remarkably precise connections between...

<!-- third item -->
...
</TABLE>

Note: the CLASSes ILLUS and TITLE could be left out, since they only
use the default alignment. Every item in the toc takes up two lines,
because otherwise it wouldn't be possible to have the title aligned
with the top of the image and the summary with the bottom.

-- 
                          Bert Bos                      Alfa-informatica
                 <bert@let.rug.nl>           Rijksuniversiteit Groningen
    <http://www.let.rug.nl/~bert/>     Postbus 716, NL-9700 AS GRONINGEN