Re: HTML/CALS/ICADD Table Prop

Harvey Bingham (bingham@amos.HQ.ileaf.com)
Sat, 29 Apr 95 20:37:26 EDT

Subject: Re: HTML/CALS/ICADD Table Prop
Cc: dsr@w3.org, html-wg@oclc.org, montulli@netscape.com

>From bingham Sat Apr 29 01:47:01 1995
From: bingham (Harvey Bingham)
To: tables@sgmlopen.org
Subject: Re: HTML/CALS/ICADD table proposal notes, 5Apr95
Cc: dsr@hplb.hpl.hp.com, dsr@w3.org, bingham

This is an interesting proposal for a significantly generalized table model
that approximates the SGML Open CALS Core Table Model, simplifies cell
location, and adds some different capabilities. From the CALS model perspective:

TGROUP as container gone.
COLSPEC and SPANSPEC (dealing with columns) coallesced into TSPECs
that provide attribute default sources, 2-dimensional descriptions
(each for list of rows and row-ranges, list of cols and col-ranges)
possibly overlapping, possibly partial specification of all cell
positions
THEAD TBODY TFOOT reorded this way, only contain rows
ROW becomes TR with content (TH|TD)* not (ENTRY|ENTRYTBL)+
ENTRYTBL gone (though may be replaced by recursive TABLE,
but not in this draft)
ROWSEP and COLSEP gone without detail replacement: all or nothing
but with compound specification for cellspacing and cellpadding
ENTRY replaced by TH and TD to fill cells
Spanning by colspan and rowspan attributes indicating how many
cols and rows in the TH and TD
axis and axes identifiers can provide alternative names for
describing TH, TD.

It proposes different units of measure, seemingly with little advantage
and with some major problems.

Hereafter lines beginning with ": " are as received from Dave Raggett.
Others are my critique. The original notes had a detailed comparison of
what had been proposed at the IETF meeting, and what had been the original
table model from the draft HTML3.0-dtd.

:
: From hplb.hpl.hp.com!dsr@ileaf.prospect.com Fri Apr 21 13:24:15 1995
: To: bingham@amos.HQ.Ileaf.COM (Harvey Bingham)
: Cc: dsr@w3.org, montulli@netscape.com, html-wg@oclc.org, tables@sgmlopen.org
: Subject: Re: IETF HTML/CALS/ICADD table proposal notes, 5Apr95
: In-Reply-To: Your message of "Fri, 14 Apr 95 16:56:23 EDT."
: <9504142056.AA09056@amos.HQ.Ileaf.COM>
: From: "Dave Raggett" <dsr@hplb.hpl.hp.com>
:
:
: In replying to this message, please Cc me directly so that I can
: filter replies from the mountain of email I am currently burdened with.
:
: Thanks Harvey, for sending me your summary of the tables and html3
: meeting at Danvers. I have now had time to reflect on the ideas
: put forward at that meeting and would like to bounce a proposal off
: you and the rest of the SGML Open table group and the HTML working group
: for a model that I believe satifies the needs for importing CALS tables,
satisfies
: is directly compatible with Netscape 1.1, and simplifies control of table
: style.
:
: The down side, is that formally, <table border> is illegal as such.
: One possible way out of this mess might be:
:
: border (border|0|1|2|3|4|5|6|7|8|9|10) 0
:
: But this doesn't allow you to specify the units for the border width.
: I welcome discussion of the issue of how to specify border widths and
: also the widths of columns. For instance should we allow floating
: point numbers; should we allow pt for points etc. or should we aim
: for an enumerated set of border widths that is implementation dependent?
:
I have sent separate comments on units: basically agreeing with Terry
Allen. I believe we need points (pt) and for convenience should have inch
(in) and millimeter (mm) measures as well.

We can allow proper decimal representation (not general floating point
scientific notation such as 1.2E3) of the measure, to which a unit is attached.

I'd eliminate pixel, a measure that is device dependent, in an otherwise
device-independent interchange of SGML information. Who knows what
the size or resolution of the requestor well enough to specify it at
the source? Would it need changing for each client?

I believe the use of "en" is inappropriate. "em" is the conventional
measure of a font pointsize, and might be justified on the basis of
recognizing some font and pointsize as the default, then such measure
is in terms of that default, which value presumably a user can change.

I believe the proportional column measure is the most useful, possibly
with some columns given fixed measure, and the others consuming the rest.

I note that a user changing a window size may require local table
recomposition.

: The proposal uses a generalization of COLSPEC which I have named TSPEC
: after a suggestion by Tom Magliery. It allows you a compact way of
: defining properties for sets of cells defined by groups of rows and
: columns. This seems to me a natural extension, that will be of great
: value when using style sheets to control the detailed appearence of tables.
appearance
:

: The DTD fragment for the table proposal follows:
:
: <!ENTITY % attrs -- common attributes for elements --
: 'id ID #IMPLIED -- as target for hrefs (link ends) --
: lang CDATA #IMPLIED -- ISO language, country code --
: class NAMES #IMPLIED -- for subclassing elements --'
: >
:
: <!ENTITY % block.align
: "align (bleedleft|left|center|right|bleedright|justify|bleedboth) center">
:
: <!ELEMENT CAPTION - - (%text;)+ -- table or figure caption -->
: <!ATTLIST CAPTION
: %attrs; -- id, lang, class --
: align (top|bottom|left|right) #IMPLIED
: >
:
I dislike using the same attribute name with different kinds
of values in different contexts. This is an unnecessary overloading.
In this small DTD fragment you have align with different namelists in
table align (bleedleft|left|center|right|bleedright|justify|bleedboth)
center to align the whole table
CAPTION align (top|bottom|left|right) #IMPLIED
to position on some side of table
[that list is not a subset of the
align for table.]
(tspec|th|td) align (left|center|right|justify) #IMPLIED
to position cell th or td #PCDATA content
[that is a subset of table align values]
[presumably if not otherwise positioned
by a richer element content]
Note that you have suggested an ordered resolution of #IMPLIED values
That is unnecessarily complex if only some of the potential values are
appropriate for inheritance.

Inheritance in the CALS model come from nearest ancestor (or
corresponding sibling in the case of COLSPECS) with an
explicit value for an attribute of that name.

Here that won't work: they will get a subset of values, and possibly
some illegal ones from table align (the presumed parent element)

: <!--
: The HTML3 table model is based upon experience with
: the CALS table model, as well as study of a wide variety
: of printed examples of tables and other material loosely
: laid out on a two dimensional matrix. The model supports
: very simple tables with minimal markup as well as large
: and complex tables. The model features:
:
: o Auto-sized tables
: o Arbitrarily nested tables
: o Control of column widths
: o Control of cell contents alignments
: o Arbitrary subclassing of groups of rows and columns
: o CALS style THEAD, TBODY and TFOOT elements
: o Conversion to speech and braille
: o Export to spreadsheets and databases
:
: Properties of table cells are determined by searching
: up a hierarchy, starting with the cell itself. The next
: level is the table row, followed by the row group (i.e.
: thead, tbody or tfoot), then the tspec elements and
: finally the table element itself. Within the sequence
: of tspec elements, they are searched starting with the
: last tspec element, and finishing with the first. This
: means you should place generic tspec elements lexically
: before more specific tspec elements. The search process
: for each property (e.g. alignments and widths) is made
: independently of other properties. You can thus set
: different properties with different (and overlapping)
: tspec elements.
Interesting and useful generalization, and inheritance model.

In this fragment tables do not appear to nest. I would prefer to
keep that restriction.

If tables should nest, inheritance needn't go outside of table,
However, the minimum width of a nested table would have to
propagate outward to affect the width of the containing
table column(s), if the entire nested table is expected to
appear where it is nested. An alternative is to allow a
reference to the conceptually nested table that could actually
appear elsewhere, unnested.

A spanned table cell is assumed to be rectangular, not ragged.
[individual widths can be associated with TD and TH.
Spanning table cells should generally get their inheritance path
from the single (top-left) position where the TH or TD occurs.
That isn't completely satisfactory for widths, as the spanned width
is the sum of widths of all the columns (in the top row where TH or TD
go) that compose the span, and the widths of the TABLE colspacing and
colpadding for each column).

No overlapping TH or TD are permitted. Thus empty
placeholder TH or TD in successive rows or columns of the span
are not permitted.
:
: Table attributes:
:
: WIDTH=<value> e.g. width="50%" or width="100 en"
: This specifies the desired width of the table. The
: default units are pixels. If the percent sign is
: given, the value is taken as the percentage of the
: document width.

I note that percentages imply a sum of 100, though it isn't either
enforceable, nor readily generated manually (given the other
width-taking stuff).

I propose the CALS use of 1* 1* 1* as arbitrary proportion, so can do 3
equal columns

:
: BORDER=<value>
: This specifies the width of the outer table border.
: If absent or border=0 then the border is not drawn.
:
This width specification is probably unnecessary, a presentation
style that depends to some extent on the granularity of the
display device. It could be a local decision, possibly to provide
heavier ruling than internal rulings if used, possibly lighter
if no internal rulings.

: CELLSPACING=<value>
: Like it sounds, cell spacing is the amount of space
: inserted between individual cells in a table.

Only a single spacing sounds like a minimum gutter between cells
vertically and same inter-row spacing. That cellspacing value
is absorbed within the straddles.
Non-negative fixed-point values are appropriate, with a unit.
Zero value means "no cell rulings internal to the border".
A non-zero value means use that ruling width for all internal
table rulings around TH or TD (but of course not within them when
they are straddling cells). Is the presumption that CELLSPACING
is necessarily marking (to provide rulings between adjacent TH
and/or TD? Otherwise how turn such rulings on or off?
:
: CELLPADDING=<value>
: Cell padding is the amount of space between the border
: of the cell and the contents of the cell.
:
Again a single non-negative value with unit to apply (as a minimum)
to all four outside margins for every cell, single or straddled.
When these are specified the remaining space after border, cellspacing,
and cellpadding are consumed (as fixed widths) and the space required
for CAPTION when it applies to left or right edge determined, then
any remaining is the amount available for relative width apportionment.

What if there is a deficiency of available width?

Presumably the cellspacing is centered between cellpaddings.

: In general the units for widths are given by a suffix.
: The allowed suffices are:
suffixes?
:
: pixels /* the default units */
: ch /* characters */
: en /* half the point-size */
: % /* percentage */
: * /* relative widths e.g. for columns */
I'd rather have default be pt.
I note that ch to be meaningful would use a monospace font.
With proportional fonts, those font metrics are required to map from
ch to a width useable for presentation.
I'd eliminate en.
I'd eliminate %, replacing it by the arbitrary integer factor on *, and
defining * as (available width)/(sum of coefficients) to allow integer
measure to whatever precision is desired. Available width was what was
left after fixed parts were deducted.
CALS used NUMBER attributes, which doesn't allow decimal numbers as factors.
NUTOKEN would, so does CDATA, although CDATA is too general when not
needed (it allows character entities in attribute values, much as if it were
RCDATA in element declared content--an SGML inconsistency).

Then percent is redundant and more restrictive than "*". If percent
were treated the same way as * it would almost work. What if sum of
percentages don't add up to 100%? Is the rest of the available width
available for *? Do the percentages include or exclude the fixed width
specifications and those for cellspacing and cellpadding?

:
: Whitespace is permitted between the number and the suffix.
Whitespace seems also desirable as separator between list items
when specifying widths on successive col. If col includes ranges,
those same ranges would need to be indicated in a width list.

CALS did not allow whitespace between numeric and unit part.

: -->
:
: <!ENTITY % cell "TH | TD">
: <!ENTITY % horiz.align "left|center|right|justify">
: <!ENTITY % vert.align "top|middle|bottom|baseline">
I recommend eliminating baseline alignment.
Baseline alignment seems meaningful only within a single TH or TD, and
then if there are different pointsizes in the same line.
When different content and pointsizes occur in different TH or TD in
same row, or when multiline content occur in those TH or TD, or when some
copntent crosses rowspans and others don't, I believe this just invites
unnecessary complexity.
:
: <!ELEMENT table - - (caption?, tspec*, thead?, tbody, tfoot?)>
<!-- hwb closed comment below after cellspacing -->
: <!ATTLIST table
: %attrs; -- id, lang, class --
: %needs; -- for control of text flow --
: model %URI; #IMPLIED -- link to formal table model --
: border CDATA #IMPLIED -- if present then draw borders --
: width CDATA #IMPLIED -- absolute or percentage width --
: cellspacing CDATA #IMPLIED -- space between table cells --
: cellpadding CDATA #IMPLIED -- margins within cells --
: %block.align; -- horizontal alignment of table not cell contents --
: noflow (noflow) #IMPLIED -- noflow around table --
: nowrap (nowrap) #IMPLIED -- don't wrap words --

Need rules on width elucidated for whole table, and when composed for a
list of individual column widths. How is width of TH or TD inherited
from the intermediary TSPEC or from this TABLE width?
Nothing at table level indicates number of columns.

nowrap -- need it keep whole TH or TD content in one line? Or allow
multiline displey, but apply just to long words, or also to phrases,
or equation forms, or unbreakable compound words such as "mother-in-law"?
Should individual cells allow scrolling if nowrap and a word is just
too wide? Should the spreadsheet display model apply where a "too-wide"
cell content can extend over a next cell without content?

: >
:
: <!--
: TSPEC defines properties of sets of cells
:
: It provides a superset of the CALS COLSPEC element, and allows
: you to specify the width and default alignment for cells in
: a given row/column range, as given by the row, col attributes.
: The ability to subclass TSPEC makes it easy to provide detailed
: control over the table style when using associated style sheets.

Not quite a superset, though the functionality is there in a nicely
generalized way: It replaces column names (that might have significance) by
implicit column numbers. It eliminates the need for those names as they
were the extreme column names for horizontal spans.
(Axis and axes phrases could generalize this naming role.)

One asymmetric judgement made in CALS was that tables were fixed width,
and grew arbitrarily in depth. That direction of viewing and scrolling
is more convenient for most screen viewing as well. The symmetry
identified for TSPEC may simplify the possibility of transposing the
display (diagonally reflecting rows and columns -- but not their
content and writing direction.)

TSPEC are elder siblings of THEAD, TBODY, and TFOOT. The computational model
for each attribute value for each cell starting at a TH or TD is to thread
independently through indefinitely many places in order to find
each implied value: TR, THEAD or TBODY or TFOOT, the full set of TSPEC, the
TABLE, or the ultimate system default. The inheritance paths are not quite
symmetric.

The TH and TD allow individual widths that can conflict with the
widths laid down by the last of the TSPECs that affect each of the
columns of a colspan. There is nothing analogous for row depths,
or any conflict among source for depth of a cell or its rowspan

TR intrudes with valign and nowrap but not align, char, width, col, row
thead tbody and tfoot allow valign
:
:
: The row and col attributes are defined by a comma separated
: list of numbers or ranges, e.g.
:
: col = "2, 4, 6, 8"
: row = "2-6"
:
: col = "2-5, 9"
:
: Note that rows and columns are numbered from 1 upwards counting
: down and to the right. A range consists of the first and last
: row or column numbers, separated by a "-" sign.

One concern with TSPEC generalization is that we now have to define the
TSPEC error conditions in 2-dimensions when only part of a table is
specified by TSPEC. The "default, cover the whole table" TSPEC must
know the number of rows and columns to specify the appropriate ranges.

The HyTime 1 -1 kind of ranging notation covering first to last would
be useful. In the absence of complete coverage, what would the
values be for "spaces between TSPEC"? You might consider the spreadsheet
3..5 indication for range, to not confuse it with the HyTime ranging.
[By the way, HyTime 1 3 would select three elements starting with the
first, The other parts of the HyTime ranging are defined awkwardly
Range should be in increasing order.

How should TSPECs be related to rows of THEAD TBODY or TFOOT?
How consistent need they be? What if there are more or less TRs?
What if there are more or less filled cells from TH or TD?

If TSPEC tries to define the full scope of the table: col="1..30" row="1..200"
presumably that doesn't count any TR from THEAD or from TFOOT more
than once?

A similar list of widths could be formed, for each column or
corresponding range of columns defined for TSPEC. What if
some but not all of the TSPECs later in the list define widths that
consume more than is available for a part of a TSPEC defining a range
of columns, and a width for the set? The widths are not independent,
particularly with overlapping ways of specifying them (both in TSPECs
and in TH or TD).

A list for each of align (applying to columns) and valign (applying
to rows) could also occur.

: -->
: <!ELEMENT tspec - O EMPTY>
: <!ATTLIST tspec
: %attrs; -- id, lang, class --
: col CDATA #IMPLIED -- set of column numbers --
: row CDATA #IMPLIED -- set of row numbers --
: width CDATA #IMPLIED -- cell width --
: align (%horiz.align) #IMPLIED -- horizontal alignment --
: valign (%vert.align) #IMPLIED -- vertical alignment --
: char CDATA #IMPLIED -- alignment char e.g. char=":" --
: nowrap (nowrap) #IMPLIED -- don't wrap words --
: >
:
Added, not in CALS

%attrs;
col
row
valign CALS kept that a row or entry attribute
nowrap

Lost from CALS COLSPEC:

charoff indication of where to place the alignment
position relative to the cell width.

rowsep the individual way to specify the rulings on or off:
colsep to the right of a cell (colsep) and below a cell (rowsep)

colnum NUMBER #IMPLIED of column starting at 1
colname NMTOKEN #IMPLIED of column, used for spanspecs, unnecessary

: <!--
: THEAD, TBODY and TFOOT are borrowed from CALS
In concept, though attribute enriched with %attrs;, and
No embedded and localized colspecs for THEAD nor TFOOT.
Instead the rows composing them can have separate TSPECs

If the row enumeration includes the rows in THEAD (one time)
and in TFOOT (one time), and THEAD and TFOOT content gets
displayed many times for a long table, those repeated displayed
rows are ignored in the row information of the TSPECs.
:
: These elements appear in the natural order they would
: appear in a short table. For longer tables which need
: to be split across pages or for which the TBODY is
: displayed as a scrolling region, the user agent will
: have to process the entire table, before reaching the
: TFOOT. This allows for simple implementations which
: ignore the THEAD/TBODY/TFOOT distinction.
:
: You can omit both start and end tags for TBODY for
: simple tables.

In general I dislike omitted starttags. There are useful
attributes on tbody. Including one with explicit default
value causes some systems to always deliver that value,
so the tag will be there.
: -->
: <!ELEMENT (thead|tfoot) - O (TR*)>
: <!ELEMENT tbody O O (TR*)>
:
Is the expectation that a distinct TSPEC applies for
each of THEAD, TBODY, or TFOOT?
It doesn't seem that row numbers can be restarted in each, as
there is no way to tie a TSPEC to THEAD, TBODY, or TFOOT.

: <!ATTLIST thead
: %attrs; -- id, lang, class --
: valign (%vert.align) bottom -- vertical alignment --
: >
:
: <!ATTLIST tbody
: %attrs; -- id, lang, class --
: valign (%vert.align) top -- vertical alignment --
: >
:
: <!ATTLIST tfoot
: %attrs; -- id, lang, class --
: valign (%vert.align) top -- vertical alignment --
: >
:
: <!--
: TR contains a single table row and corresponds
: to CALS ROW element.
: -->
: <!ELEMENT tr - O (%cell)*>
:
: <!ATTLIST tr
: %attrs; -- id, lang, class --
: valign (%vert.align) #IMPLIED -- vertical alignment --
: nowrap (nowrap) #IMPLIED -- don't wrap words --
: >
That would match some word processors' model where every row is
effectively independent, with distinct cell widths and number.
I recommend that the number of columns consumed by a table be knowable
without computation from some attribute.
:
: <!--
: Table cells are differentiated into header cells and
: data cells with TH and TD respectively. This allows
: for very flexible layout of header and data cells.
:
: The AXIS, AXES attributes are used to support conversion
: to speech and braille, and for exporting table data to
: spreadsheets or databases.
I note axis and axes would seem to be particularly useful on TH
cells, and their names would desirably apply as defaults for TD
or possibly as value pairs for the axes on TD cells
in the corresponding row and column.
[I see no nice way to use content of colspan TH cells as top of
hierarchy for columns with further TH thereunder, in developing the
axis and axes values. Need explicit semantics on how their contents
is ordered and used. :
: ROWSPAN and COLSPAN are used to merge cells across
: rows and columns. Tables with overlapping cells are
: considered to be illegal.
I believe that spans crossing THEAD into TBODY, or TBODY into TFOOT
or attempting to span beyond the TABLE border are considered to be illegal?
(or should they cause bumps, extending the appropriate region?) Since
width judgement can be destroyed by more columns this arbitrary
extension is risky.

With TSPECs defined for a fixed number of rows, table extension is
awkward. Growth in thead, tbody or tfoot would affect the TSPEC row and
col attribute values..

: -->
: <!ELEMENT (%cell) - O (%text;)*>

So far as I can tell from a terse look, no member of %text;
in HTML3.0.dtd can contain another table. That certainly simplifies
the presentation process.

: <!ATTLIST (%cell)
: %attrs; -- id, lang, class --
: colspan NUMBER 1 -- columns spanned --
: rowspan NUMBER 1 -- rows spanned --
: width CDATA #IMPLIED -- absolute or percentage width --
: align (%horiz.align) #IMPLIED -- horizontal alignment --
: valign (%vert.align) #IMPLIED -- vertical alignment --
: char CDATA #IMPLIED -- alignment char e.g. char=":" --
: nowrap (nowrap) #IMPLIED -- don't wrap words --
: axis CDATA #IMPLIED -- axis name, defaults to element content --
: axes CDATA #IMPLIED -- comma separated list of axis names --
: >
I assume the default way to define the column into which a TH or TD
falls is next one in order, left to right, obeying and skipping
rowspanning encroaching from above, and counting colspan amounts.
Is it valid to ignore cells at the right that would otherwise have
no content? Or does the semantics for axis and axes values extend
to this use?
Since the axis name can differ for each cell in a column (or in a row)
how does that help to reuse those in the axes model: Is an axis value
used only once per table,
and then referred to by successive axes values? Is there a column/row
pair or hierarchy of column/hierarchy of row pair useful to specify
axes?

An alternative use for axis, rather than as a locator, is as an
alternative description. That would be useful if a graphic were in a
cell. [Suggested by the comment, default is text content.]
:
: -- Dave Raggett <dsr@w3.org> url = http://www.hpl.hp.co.uk/people/dsr
: Hewlett Packard Laboratories, Filton Road, | tel: +44 117 922 8046
: Bristol BS12 6QZ, United Kingdom | fax: +44 117 922 8924
:

Regards/Harvey Bingham
SGML Open Tables Committee Collator
CALS Table model coauthor

Standards Manager
Interleaf Inc. | tel +1 617 290 4990 x3419
Prospect Place, 9 Hillside Avenue | fax +1 617 290 4970
Waltham, MA 02154 USA | bingham@ileaf.com