HDL Q&A

Jon Bosak (Jon_Bosak@Novell.COM)
Mon, 31 Oct 94 22:05:44 EST

___________
HDL Q and A
~~~~~~~~~~~

============================================
Questions and Answers about the HDL Proposal
============================================

This document is a companion piece to the proposal for a Hypertext
Delivery Language (HDL) to be developed upon an existing standard,
SDL. It poses and answers some of the more basic questions relating
to HDL and its relationship to other delivery methods.

-------------------------------------------------------------------
What would be the relationship between HDL and authoring languages?
-------------------------------------------------------------------

Like SDL, the HDL contemplated by the proposal is not intended to be
an authoring format but rather the target of a conversion process from
some other format optimized for authoring or data retrieval. The most
obvious source format for HDL would be HTML, presumably HTML 3.0,
which is here assumed to be a refinement of the existing HTML 2.0
without the addition of typographical controls. A conversion script
to produce HDL from HTML is obviously a prerequisite to this plan, but
such a script would be relatively easy to develop.

It may be wondered what advantages a conversion process would have
over the implementation of format controls directly in HTML. There
are several aspects to this question.

First, it is necessary to distinguish between the need to control the
typography of documents in order to produce a distinctive appearance
and the need to adopt a completely different appearance for each
document. The former is quite common; the latter is quite rare
(except in advertising). Typically, individual users and
organizations seek to develop a distinctive look for their documents
and demand the controls necessary to achieve the visual effect
required to distinguish their documents from those produced by others.
Once having achieved such an effect, however, this distinctive look
becomes associated with individual or corporate identity and is seldom
changed. It is often necessary to develop a suite of styles for
different subdocument types -- the different sections of a newspaper
or magazine, for example -- but once developed, such styles change
infrequently.

A second, complementary observation is that when styles do change,
they tend to change massively, and such change creates legacy
conversion problems if it cannot be accomplished at a single point of
control. Style changes can most reliably be performed if the original
format-independent tagging is preserved.

For both of these reasons, it is far better in practice for
typographical specifications to be compartmentalized in a single place
than to be scattered throughout source documents. This is why it has
long been considered good practice in the publishing world to confine
all typographical specifications to named sets in stylesheets rather
than applying unnamed specifications to individual elements.

The conversion model for HDL production would implement this practice
in the world of WWW delivery. Documents would be prepared in a markup
language free of formatting instructions (HTML), thus decoupling them
from future changes to their overall visual identity. Specific styles
would be associated with logical elements during the conversion to
HDL. Tuning the styles would be accomplished over an entire set of
documents by changing the converter (which could be table-driven to
make such changes easy) or for individual documents by changing the
HDL output. The second alternative is made possible by the fact that
styles in SDL are maintained in a separate section in each document,
the Table of Semantics and Styles (TOSS). Changing the specification
of a named style in the TOSS changes its behavior throughout the
document.

A third, unrelated reason for preferring a conversion model is that
HDL could provide a single, standard target format for an unlimited
number of source formats, not just HTML and its derivatives. The
possible source formats could include large, general-purpose markup
languages such as ISO 12083; industry markup languages such as
DocBook; or non-structured input formats that are even simpler and
easier to use than HTML. Anything that could serve as input to a
conversion script could serve as source material for HDL delivery.

Perhaps the most significant of these alternative sources is the huge
amount of legacy data currently in word processing and desktop
publishing formats. HDL would be particularly well designed to serve
as the target format for such data. It is possible to use HTML as the
target for a conversion from flat, format-rich legacy data, but it is
not well suited to the purpose and is incapable (as it should be) of
expressing the look of the original. HDL, on the other hand, could
capture almost all of the original formatting information and present
legacy documents online in a form that expressed the intention of the
publisher as well as could be accomplished within the limitations of
the medium.

--------------------------------------------
Why not a page description language instead?
--------------------------------------------

People who have not had much experience with online document delivery
in real-world, cross-platform situations often wonder why documents
can't simply be served out in a fixed format such as PostScript or
RTF. The answer, briefly, is that it is not possible to achieve
complete page fidelity -- that is, to completely capture and retain
the look of a particular page layout -- and still make the document
usable across platforms with different fonts, display sizes, and
window aspect ratios.

Vendors of page-oriented rendering engines such as Acrobat presume
that the user will be satisfied to allow the document to occupy the
display completely. This is true only in demo situations and in very
limited, single-tasking operating environments. Full-function
operating environments of the present, and all common operating
environments of the future, will present a completely different
situation. In environments that allow it, users quickly learn to work
in a multitasking world controlled through multiple windows. And
their first requirement is that those windows be capable of arbitrary,
user-definable shapes and sizes. Documents that depend on a fixed
page display don't work in such environments. They also don't work
for people with special presentational needs (e.g., large type and
Braille) and on devices with unusual display requirements (e.g., PDAs
and airline cockpit displays).

Discussions of this point often divide people into two camps: one that
insists on completely format-independent delivery and another that
insists on complete page fidelity. The truth, borne out by several
years of successful online publishing in high-level hypertext systems,
is that a great deal of typographical design -- perhaps 70 to 80
percent of the basic visual information traditionally specified by
publishers -- can and should remain under the control of the designer.
But there is a significant part that can't be specified in
cross-platform online environments, and that is the part relating to
page geometry. The typographic controls that still make sense in
online environments are the ones addressed by the stylesheet languages
of universal SGML browsers and by the style attributes that would be
included in HDL.

-----------------------------------------
How does HDL fit into the larger picture?
-----------------------------------------

Any discussion of the future of HTML must take place in the context of
the imminent arrival of universal SGML Web browsers. An HTML browser
such as Mosaic is a browser that can parse and present documents
written in a single ISO 8879-compliant markup language, HTML.
Similarly, an SDL browser such as the CDE help viewer is a browser
that can parse and present documents written in a single ISO
8879-compliant markup language, SDL. By contrast, a universal SGML
browser such as DynaText is one that can parse and present documents
written in _any_ ISO 8879-compliant markup language.

Universal SGML browsers have been available for some time, but they
were sold only as components of expensive SGML publishing systems and
were not Web-aware. The first free universal SGML Web browser,
SoftQuad's Panorama, was announced and demonstrated at the WWW '94
conference in Chicago. It will be bundled with future releases of
NSCA Mosaic. The appearance of Panorama and similar Web browsers
capable of delivering any ISO 8879-compliant markup completely changes
the landscape of WWW authoring and document delivery.

It may be helpful to distinguish different viewer technologies along
the axis of "hardwired" versus "programmable" in two key areas: tag
sets and style definitions. Categorized in this way, HTML, SDL/HDL,
and universal SGML browsers can be placed in a hierarchy of increasing
functionality for the document publisher.

First level: hardwired tag set, hardwired styles (HTML browsers)

Second level: hardwired tag set, programmable styles (SDL/HDL
browsers)

Third level: programmable tag set, programmable styles (universal
SGML browsers)

This picture is easiest to understand if we look first at the two
extremes. Tools at the first level (HTML browsers) deny the publisher
any control over the basic rules of document structure or the
typographical behavior associated with a given element; both the
markup language and the styles are determined by the developers of the
browser and are bound in at compile time. Tools at the third level
(universal SGML browsers) put complete control over the structural
rules and typographical behavior in the hands of the document
publisher; both the definition of the markup language and the styles
associated with each element are specified in control files that are
loaded with the document at run time or kept available in a local
cache.

The proposed HDL would occupy a space between these two ends of the
functionality spectrum. Like HTML, the HDL markup language would be
hardwired into the browser, but styles would be definable on a
document-by-document basis by the publisher. The key difference
between style specification in HDL and style specification in
universal SGML browsers is that HDL styles would be encapsulated in
the document itself, whereas styles in universal SGML browsers are
generally specified in separate stylesheets that are expected to be
cached after the first document of a given type has been downloaded
from the server.

----------------------------------------------
Why HDL in a world of universal SGML browsers?
----------------------------------------------

The key question raised by this analysis is why an SDL-based approach
should be proposed in a world where free universal SGML Web browsers
will soon be widely available. There are several reasons to support HDL
as at least an interim standard.

First, HDL would be more efficient for online document delivery than
raw SGML served out with stylesheet files. The overhead needed to
parse and render a fixed tag set like HDL is much less than what is
needed to parse and render arbitrary markup. This is admittedly a
short-term concern, but a real one.

A corollary is that the relative simplicity of the HDL delivery
standard would keep the level of programming needed to implement a Web
browser within reach of the entrepreneurs who have been so successful
in making the WWW a reality over the past year and a half. Universal
SGML browsers are vastly more complex than browsers built around a
hardwired tag set, and the programming effort needed to compete
directly with them is simply beyond the reach of most independent
developers. The proposed functional split would allow cooperative
efforts to be divided between content production tools and delivery
tools, keeping both aspects within manageable bounds. Standardization
on HDL as a browser standard would level the playing field for smaller
players, encourage the development of free conversion tools, and keep
the Web open for experimentation and innovation a while longer.

A third reason to propose HDL is because, as noted above, it is so
well suited to the quick conversion of legacy data produced in word
processing and desktop publishing formats. It is entirely possible to
implement a suitably flat, format-rich markup language to accomplish
this purpose in a universal SGML browser, but such a language would
end up as no more than a clone of HDL. An HDL browser could provide
the same results with much less programming overhead.

A final reason for preferring HDL as a Web delivery standard is that
it would standardize HDL's internal method of style specification.
The corresponding single standard for the stylesheet language that is
most likely to be used by universal SGML viewers, DSSSL, is still
about a year away from general implementation. The SGML viewers can
work around this lack of standardization by remapping each other's
stylesheet specifications on the fly, but for the interim, HDL's
single format would give it an additional performance advantage.

Having said all this, it must be acknowledged that there is no
capability of the proposed HDL that cannot be implemented, and will
not be implemented, in universal SGML browsers. The most important
reason for standardizing on HDL is to give a coherent focus to the
current rather poorly received efforts to add formatting controls to
HTML, efforts that go contrary to the design principles of the
language. In the worst case, however, failure to standardize on HDL
will simply hasten the adoption of universal SGML browsers. The
biggest losers in such a scenario will be the current developers of
HTML tools, not the larger WWW community.

========================================================================
Jon Bosak, Novell Corporate Publishing Services jb@novell.com
2180 Fortune Drive, San Jose, CA 95131 Fax: 408 577 5020
A sponsor of the Davenport Group (ftp://ftp.ora.com/pub/davenport/)
========================================================================