Re: Release of HTML 2.0 document for editing

"Daniel W. Connolly" <connolly@hal.com>
Date: Thu, 25 Aug 94 14:28:06 EDT
Message-id: <9408251828.AA27899@ulua.hal.com>
Reply-To: connolly@hal.com
Originator: html-wg@oclc.org
Sender: html-wg@oclc.org
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <html-wg@oclc.org>
Subject: Re: Release of HTML 2.0 document for editing 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group (Private)

[Peter: I hope you don't mind my reposting your message...]

In message <9408241227.AA02666@curia.ucc.ie>, Peter Flynn writes:
>> If you have a system for collecting the 50 or so HTML files into one
>> nice looking PS file with a table of contents and all, BY GOD MAN!
>> volunteer to provide this service to the group! Or tell us about this
>> wonderful tool!
>
>Hardly `wonderful'. Just a crummy little DOS program that converts SGML
>to TeX. Making PS is then trivial, and you get decent control over the
>appearance. Table of contents, table of illustrations, index of 
><dfn>first terms</dfn> etc is all automatic.


Ah... well, automatic except for the following details...


>The problem is purely managerial: if these 50-odd (very odd :-) files
>are non-linear, ie they are .html files which are referenced thru the
>normal hypertext links, what do you want done with them? Separate 
>`chapters', separate `sections', a big footnote if the file is small?

Here are the methods that have been employed:

* when Tim maintained it, the structure was kept in the Makefile:

HYPERTEXT = \
   HTML.html \
     StatusMeanings.html \
   AndMIME.html \
   Intro.html \
   Text.html \
   Tags.html \
     Elements/HEAD.html \
     Elements/TITLE.html \
     Elements/ISINDEX.html \
     Elements/LINK.html \
     Elements/BASE.html \
     Elements/NEXTID.html \
        \
     Elements/BODY.html \
     Elements/A.html \
	...

He employed a sed-script to "demote" nodes by changing <h1> to <h2> etc.,
then cat them all together in one big HTML file. I don't think this preserved
links exactly, but since it was only used for print, he was happy with
it. I didn't like it.


* to build html-spec-19940603.ps.Z, I developed a set of python modules
  that convert HTML to MIF, tracking the SUBDOCUMENT links to determine
  whether an HTML node was a section, subsection, etc. Note that this
  does not address the issue of next, previous, and up links

* for the more recent versions, we converted the whole thing to NodeSet,
  The hardcopy tools we have for Nodeset use a TOC file to determine
  the order and "depth" of nodes in the document. The HTML building tools
  generate a TOC (though it uses strange markup...) and next,previous, top,
  etc. navigational links.


>I'm happy to volunteer to do this if someone can specify what the `best'
>(more likely `optimal') thing to do with non-linear text is when you try
>to print it in a linear artifact like a paper document.

The TOC gives the order of the nodes, and their "containment" or "depth."
If you can map this to LaTeX reliably, go for it! Note that if your
mapping places contraints on the HTML, you'll have to make these constraints
known to each of the editors of the document.


This is not a very complex problem, theoretically. But in practice, it
is very tedious and time-consuming.

By nature, I HATE tedious and time-consuming processes. Every time I
started the hardcopy build process, my mind would get pre-occupied
designing a toolset that would do all this stuff for me. It should
be easy to build such a toolset from sgmls, python, and LaTeX or lout.
Perhaps I'll cook it up someday. But in the meantime, we've got
to get this thing done.

Anyone who is willing to own this piece of the work, PLEASE step forward!

Dan