Re: Shortref [was: Re: Super and Subscripts]

James D Mason (MASONJD@oax.a1.ornl.gov)
Mon, 23 Jan 95 18:13:21 EST

Somewhere along in this discussion the point has been made that the
professional mathematician may not be satisfied with anything less than TeX.
That's a position quite familiar from discussions within our own publishing
organization. Indeed, we generate TeX from SGML for our own paper output and
use a modified LaTeX2HTML to generate graphics for the Web. But while TeX
generates the best-quality _images_ of mathematical information, we have a
problem in that we want to search the files. For that reason we're inclined to
go full SGML for the source and generate the TeX code through our parsers.

Our problem is that the practice of mathematics has become intertwined with
the presentation of it. Although a mathematician may not _think_ in formulae,
there's little way of communicating mathematical information without
presentation. We would prefer for our mathematical SGML to be as _pure_ as
possible, that is to say, as structural as possible, with as little typography
in it as we can manage. We're trying to decide on where to compromise.

Some mention has been made of the math encoding in ISO 12083, based on the old
AAP scheme. I would draw your attention also to the scheme in ISO TR 9573,
Part 11. That describes a system originally developed at CERN and later
expanded at ISO for publishing (in the latter case) ISO standards. While the
9573 DTD, like the 12083 DTD, has many typographic compromises, its syntax is
closer to that of eqn and TeX.

The published 9573-11 document also shows another means of markup
minimization, the "null end tag". Thus "<df>e<sup>2<sup>n</sup></sup></df>"
becomes "<df>e<sup/2<sup/n//</df>" ("<df>" is the 9573 tag for a displayed
formula).

Lee raised the question of chemistry. I draw your attention to a DTD that is
under development at ftp://www.ornl.gov/pub/sgml/DTD/chemdev.dtd. This object
is not for the folks who want to use only six tags in their files. It is,
however, designed to represent the nice organic ring structures and their
friends in a highly structured way what will allow intelligent searching of
the chemical information.

If anyone is interested in commenting on this DTD, please contact Thomas
Tallant (ttz@ornl.gov).

Another issue related to coding for structure, as opposed to coding for
typography is the need for a style sheet language to go along with the SGML
representation. DSSSL has been mentioned several times. I am happy to announce
that DSSSL was approved on the ballot that just closed this month. It will be
revised (starting at the ISO/IEC JTC1/SC18/WG8 meeting in Los Gatos, next
month). But anyone who would like to see the curent documents can look in the
directory ftp://www.ornl.gov/pub/sgml/WG8/DSSSL for the SGML source (coded to
9573-11, as well as PostScript and PDF files.

We have posted some of our early experiments in presenting technical reports
at http://www.ornl.gov/ORNL/EINS_Reports/Tech_Reports.html. Some of the math
(particularly inline characters/formulae) and tables in these documents is
generated from modified 9573-11 SGML; some is scanned images from the paper
documents (both show the limitations of using GIF files for things of any
subtlety.) I would be glad to discuss what we have done.

James D. Mason (masonjd@ornl.gov)
Information Management Services
Oak Ridge National Laboratory