HTML and Publishing Architectures

Dan Connolly (
Wed, 29 Mar 95 13:08:38 EST writes:

[Some interesting, though at best obliquely relavent stuff...
For my first posting to this forum since moving from Austin
to Boston, I'll try to do the same...]

> Jon's post on SGML-based Web services triggered some thoughts about
> the architecture of Web publishing that should provide useful discussion
> points for the upcoming IETF.

Speaking of useful discussion points, I find myself referring
to a document by Douglas Englebart over and over:

an excerpt from Knowledge-Domain Interoperability and an Open
Hyperdocument System

The jist of his paper is what he calls "interoperable knowledge
domains." He describes a transtion from conventional MIS systems (in
large companines like Boeing) to an integrated hyperdocument system,
and estimates the cost savings at around $3billion/year (or was it
trillion... can't remember).

We're after is a balance between information (symbols with meaning and
context), data (bits), and artifacts (represetions of
information). You can't send "information" over the wire: information
changes when it changes context. But you can represent views of the
information as data in standard formats and exchange them.

The cost of maintaing a document in HTML is generally less than the
cost of maintaining a document in DocBook or some similarly complex
document type. But for an organization like Novell, the cost of
maintaining 10,000 such documents is less when using DocBook. DocBook
is a more natural medium for the information in the context of the
authoring environment.

In the context of the average consumer of Novell's documentation,
docbook is not necesarily cost-effective. HTML browser technology
is cheap. And so there is a balance: maintain the documents using
DocBook, and provide HTML views for the web audience.

Now if Novel were concerned exclusively with its authors and
its end-users, they might use FrameMaker or MS-Word or any other
hand DTP represenation for their documents. Conversion tools to
HTML from those formats work OK, (though I think that if you looked
at them closely, you might start to think that SGML provides a
basis more reliable system).

But I think SGML's big win is when Novell starts to interact with some
"peer" organizations: perhaps they will contract out parts of their
documentation to another company. It is often easier to tell that
contractor "here's the DTD for Docbook" than it is to require that
contractor to re-tool its shop to FrameMaker or MS-Word and the
corresponding conversion tools.

And even this interaction (exchanging SGML documents) is pretty
primitive. Eventually, we'll want tightly-integrated object-based
document management systems -- Lotus Notes is a reasonable preview
of the technology to come, though it largely abandons Englebart's
"explicitly structured documents" requirement (I think -- no first
hand experience here).

One of Englebart's system was called "Augment." The information age is
about having the computer augment our knowledge acquisition and
maintenance capabilities. And while HTML is a reasonably significant
step forward from plain text, raster image formats, or even postscript
it allowing the computer to understand more of the information present
in the data, I think we're bound to see an explosion in applications
based on knowledge-representation research from the A.I. community.

With HTML, we acknowledge the value of exchanging structured
information over a picture of the information (well... some of us
acknowledge this...). But soon, hyperdocuments will include "active
agents" (thought I cannot stress enough how important it is to avoid
putting turing machines in your documents if you can help it),
semantic networks, knowledge bases (or at least rule sets),
self-describing objects, etc.

Collaborative Technologies and Integrated Open Hyperdocuments
$Id: collaboration.html,v 1.1 1995/02/14 22:42:12 connolly Exp $

for some specific technologies. Take a close look at LINCKs, for

Daniel W. Connolly "We believe in the interconnectedness of all things"
Research Technical Staff, MIT/W3C