Yet another table proposal for HTML 2.1

Dave Raggett (dsr@hplb.hpl.hp.com)
Wed, 3 May 95 07:10:54 EDT

The core table model in the HTML3 I-D when combined with THEAD, TBODY
and TFOOT goes a long way to satisfy most people's needs. My previous
posting suggested a TSPEC element for concisely representing properties
of groups of table cells.

Harvey Bingham's detailed analysis in <9504300032.AA19000@amos.HQ.Ileaf.COM>
points out a number of complicating factors in dealing with numbered rows
and columns. I would like to make a new proposal that I believe removes
this complexity both for authors and developers of wysiwyg editors.

In the new proposal, information specifying properties of table cells is
bound to the cells via the class mechanism. You can use the class names to
define (for instance) table borders in terms of adjacent class names, e.g.
draw a 2 pixel wide border between thead and tbody cells. I believe this is
preferable to including style info as complex attributes on HTML elements.
Not only is it cleaner, but it is also much more extensible, e.g. to deal
with margins around cell contents, font sizes, border styles, and background
colors and textures without any changes to HTML.

There is still a case though for placing width and alignment info within
the HTML markup. Column widths are really global properties of the table
rather than of individual cells, as by definition, you can't use different
widths for different cells in the same column. In contrast, the horizontal
and vertical alignment of text within cells is very much a property of
individual cells.

For the HTML 2.1 spec we should be aiming for consensus on a simple
extension to the core table model as proposed in HTML+ and the recent
HTML3 Internet Draft. That model uses autoscaled tables, where the column
widths are determined from a prepass of the table contents. The ability
to specify a preference for the table width can be used to limit columns
from becoming unnaturally wide. This mechanism is what is deployed right
now with Netscape 1.1.

Sometimes authors may feel the need to specify the widths directly.
The simplest case is to control the relative widths of columns.
Absolute units can make sense when combined with style sheets that
specify font sizes and other details. In the absence of such info
browsers could choose to treat absolute column widths as guides e.g.
for the relative column widths.

CALS specifies widths as a set of elements, with one per column.
For HTML 2.1, I would prefer a simple list of widths. The ability
to mix relative and absolute widths causes problems in the absence
of style sheets as it is then problematic to merge two sets of
relative units when the relative scale of each set is unknown.
For this reason, I would prefer to force all columns to be specified
with relative widths or all columns to be specified in absolute widths.
Once this decision is made, it follows that a single measure of units
can be used for all columns, as the absolute units can be readily
interconverted to the same scale.

For HTML 2.1 we have the choices:

a) stick to the autosizing scheme plus table width

b) as (a) but allow relative widths to be specified

c) as (b) but allow a range of absolute units

I therefore propose two attributes - COLSPEC for specifing a list of
numbers and UNITS for specifing the units, defaulting to relative units.
The width numbers should permit floating point numbers with decimal points
but not exponents. A simple space separated list would suffice e.g.

<table colspec="1 2 2 2 5">

specifies a table where the middle three columns were twice the width
of the first column, and the last column was five times the width of the
first column.

<table colspec="35 20 20 20 50" units=mm>

specifies a table with 5 columns of widths 35mm, 20mm, 20mm, 20 mm and
50mm respectively.

<table colspec="1.5 2.0 2.0 1.5" units=inch>

specifies a table with 4 columns of 1.5, 2, 2, and 1.5 inches wide.

Some tables have many adjacent columns of the same width. This motivates
introducing a repeat count e.g. 3x20 is equivalent to "20 20 20".

<table colspec="35 3x20 50" units=mm>

The set of allowed widths could include:

pt points
pica pica
em em units (= font point size)
inch inches
cm centimeters
mm millimeters
rel relative (the default)

The suggested names use the singular form i.e. "inch" not "inches"
or a widely used abbreviation for the units, e.g. pt for points.

This proposal uses a TSPEC element to specify cell properties
concisely, but using the class attribute to bind to cells rather
that row or column numbering.

<!ELEMENT TSPEC - O EMPTY>
<!ATTLIST TSPEC
class CDATA #REQUIRED
align (left|center|right|justify|char) #IMPLIED
char CDATA #IMPLIED
charoff CDATA #IMPLIED
margin CDATA #IMPLIED
valign (top|middle|bottom|baseline) #IMPLIED
>

For example:

<tspec class="percent" align=char char="%" charoff="1.5in">

This would be appropriate for cells like: <td class=percent>67%

You can also use the THEAD/TBODY distinction to align TH cells
differently depending on whether they are in THEAD or TBODY rows:

<tspec class="tbody:th" align=right>

Would override the default center alignment for TH cells but only for
TH cells within TBODY rows. The following defines the matching rules.

It seems wise to distinguish tag names from class names to allow for
the possibility of matching on class values defined for THEAD, TR and
TH etc. The basic idea is tagname followed by a period and a class name.
You can omit either the tagname or the class name. The period is only
needed if both are present. When you need to match on more than one
level, e.g. on THEAD and TH, then you need to separate them with a
colon. Some examples should make this easier to follow:

tr.foo all cells in rows with <TR CLASS=foo>
td.bar all cells with <TD CLASS=bar>
thead all cells in the THEAD group
thead:th.foo all cells in the THEAD with <TH CLASS=foo>
tr.foo:td.bar all cells in rows with <TR CLASS=foo> and <TD CLASS=bar>

I have excluded cell width from TSPEC to simplify the design of browsers
and wysiwyg editors for HTML 2.1, but this could be added back in for
a full blown HTML3 version if so desired.

The full DTD sample for the new table model is then:

<!ENTITY % attrs -- common attributes for elements --
'id ID #IMPLIED -- as target for hrefs (link ends) --
lang CDATA #IMPLIED -- as per RFC 1766 for language --
class CDATA #IMPLIED -- for subclassing elements --'
>

<!ENTITY % cell "TH | TD">
<!ENTITY % horiz.align "left|center|right|justify">
<!ENTITY % vert.align "top|middle|bottom|baseline">

<!ENTITY % block.align
"align (bleedleft|left|center|right|bleedright|justify|bleedboth) center">

<!ELEMENT caption - - (%text;)+ -- table or figure caption -->
<!ATTLIST caption
%attrs; -- id, lang, class --
align (top|bottom|left|right) #IMPLIED
>

<!ELEMENT table - - caption?, tspec*, thead?, tbody, tfoot?)>
<!ATTLIST table
%attrs; -- id, lang, class --
border (border) #IMPLIED -- presence or absence of borders --
width CDATA #IMPLIED -- width of table --
colspec CDATA #IMPLIED -- column widths (see above) --
%needs; -- for control of text flow --
%block.align; -- horizontal position of table on page --
noflow (noflow) #IMPLIED -- no text flow around table --
nowrap (nowrap) #IMPLIED -- no auto text wrap on whitespace --
>

<!--
TSPEC defines properties for given classes of cells.
The class attribute is always required and is matched
to the corresponding class attribute on cells. Note that
the TSPEC class attribute is interpreted hierarchically:
[thead|tbody|tfoot].[th|td].cell-class-name, e.g.
class=thead.th or class=th or class=date

TSPEC makes it easier for authors and developers of wysiwyg
editors by simplifying the process of binding properties
to cells.

Either the property is defined explicitly by a cell attribute
Or the property is defined in a TSPEC with a matching class
Or the property is defined by a default for the cell (th or td).

Search for a TSPEC element matching a given cell starts in
reverse lexical order that the TSPEC elements appear in the
file. The first matching element is used to set all cell
alignment properties. This makes for simple and fast code!
-->

<!ELEMENT tspec - O EMPTY>
<!ATTLIST tspec
class CDATA #REQUIRED -- binds to cell class --
lang CDATA #IMPLIED -- as per RFC 1766 for language --
align (left|center|right|justify|char) #IMPLIED
valign (top|middle|bottom|baseline) #IMPLIED
margin CDATA #IMPLIED -- around cell contents --
char CDATA #IMPLIED -- the alignment char --
charoff CDATA #IMPLIED -- offset of alignment char --
nowrap (nowrap) #IMPLIED -- don't wrap words --
>

<!--
THEAD, TBODY and TFOOT are borrowed from CALS

These elements appear in the natural order they would
appear in a short table. For longer tables which need
to be split across pages or for which the TBODY is
displayed as a scrolling region, the user agent will
have to process the entire table, before reaching the
TFOOT. This allows for simple implementations which
ignore the THEAD/TBODY/TFOOT distinction.

You can omit THEAD, TBODY and TFOOT for simple tables.

By default, the cells in THEAD are vertically aligned
with their content at the bottom of the cell while
for TBODY and TFOOT, the content is by default aligned
at the top of the cell. This behaviour can be easily
overridden with a tspec element and class=thead.th etc.
-->
<!ELEMENT (thead|tfoot) - O (tr*)>
<!ELEMENT tbody O O (tr*)>

<!ATTLIST (thead|tbody|tfoot) %attrs; -- id, lang, class -->

<!--
TR contains a single table row and corresponds
to CALS ROW element.
-->
<!ELEMENT tr - O (%cell)*>
<!ATTLIST tr %attrs; -- id, lang, class -->

<!--
Table cells are differentiated into header cells and
data cells with TH and TD respectively. This allows
for very flexible layout of header and data cells.

The AXIS, AXES attributes are used to support conversion
to speech and braille, and for exporting table data to
spreadsheets or databases.

ROWSPAN and COLSPAN are used to merge cells across
rows and columns. Tables with overlapping cells are
considered to be illegal.
-->
<!ELEMENT (%cell) - O (%text;)*>
<!ATTLIST (%cell)
%attrs; -- id, lang, class --
colspan NUMBER 1 -- columns spanned --
rowspan NUMBER 1 -- rows spanned --
align (%horiz.align) #IMPLIED -- horizontal alignment --
valign (%vert.align) #IMPLIED -- vertical alignment --
margin CDATA #IMPLIED -- around cell contents --
char CDATA #IMPLIED -- alignment char e.g. char=":" --
charoff CDATA #IMPLIED -- offset of alignment char --
nowrap (nowrap) #IMPLIED -- don't wrap words --
axis CDATA #IMPLIED -- axis name, defaults to element content --
axes CDATA #IMPLIED -- comma separated list of axis names --
>

-- Dave Raggett <dsr@w3.org> url = http://www.hpl.hp.co.uk/people/dsr
Hewlett Packard Laboratories, Filton Road, | tel: +44 117 922 8046
Bristol BS12 6QZ, United Kingdom | fax: +44 117 922 8924