HTML and Publishing Architectures

weibel@oclc.org
Mon, 27 Mar 95 12:19:12 EST

Jon's post on SGML-based Web services triggered some thoughts about
the architecture of Web publishing that should provide useful discussion
points for the upcoming IETF.

Fragments from Jon Bosak's post:

- 110,000 pages of ... manuals ... and ... documentation.
- none of this material is in HTML... All of it is in DocBook SGML
- Only when the fragment is downloaded is it translated into HTML
- maintains existing investment in documents
- our WWW deliverables all from the same source.
- makes possible much more powerful searching
- allows us to easily change the HTML as the standard evolves.
- positions us nicely to use SGML Web browsers in the future.

Jon has identified the essential advantages of SGML-based document
services on the net; this model will likely dominate formal
network-based publishing.

A number of major publishers (including, for example, Elsevier, the
American Institute of Physics, the American Physical Society, IEE,
IEEE, ACM, and the American Chemical Society, to name a few) are
redesigning their journal production capabilities around SGML, and they
will be well positioned to support Web-based delivery of their
products. _Applied Physics Letters Online_, an American Institute of
Physics publication, became available in January, the first
paid-subscription scholarly journal on the Web to my knowledge.

This phenomenon has important implications for the design and
evolution of HTML. HTML must remain accessible to the isolated
WebMaster managing a dozen home pages, but it must also support the
demands of the most sophisticated publishers.

The diagram below illustrates one view of the Web as a publishing
medium. I hope it may be useful to frame the discussion on where the
group and implementors should focus effort.


{ -----------HTML -------------}

--------------------------------- ___
| A | B |
| User |
| Interface | Simple | D
| Configuration Data | a
| and Display | Markup | t
------------------\ Rendering | a
| D \ | |
| \---------------* * * * * * * * M
| Client Side | | C | a
| Scripting & | | | r
| External | | Complex | k
| Applications | | Data | u
| | | Markup | p
-------------------- | (SGML) |
| |
--------------- ___

My own assertions:

HTML is currently an amalgam of blocks A and B, resulting in confusion
about what is really important in the language, and hence uncertainty
about where to invest first effort.

Network publishing will benefit to the extent that Blocks B and C
are smoothly integrated: Retain the entry level simplicity that has
helped make HTML popular, while removing barriers to the practical
application of formal SGML in the Web. Simple data markup facilities
should be cleaned up and tweaked, but leave the complex details in
SGML, and make it easy to pass it through to an HTML client.

HTML evolution should be focused on Block A, with the perspective
that Web browsers are, in effect, a User Interface Server that is
configurable from the data server. Make it easier to provide good
quality user interface controls that serve a wide variety of needs
(both of users and information providers).

Providing standardized interprocess communication to external
applications (the interface between Blocks A and D) is essential to
the extensibility of the Web.


Stuart Weibel
Senior Research Scientist
OCLC Office of Research
weibel@oclc.org
(614) 764-6081 (v)
(614) 764-2344 (f)
http://www.oclc.org:5046/~weibel

Addendum:

OCLC's Electronic Journals Online (EJO) service embodies many of the
features that Jon described, and publishers are signing up.

(See http://www.oclc.org:5046/publications/weibel/web_pub_arch/ for
technical details of OCLC's architecture for scholarly publishing on
the Web).