Re: making old text publicly available on the web

Murray Altheim (murray.altheim@nttc.edu)
Thu, 6 Jul 1995 17:26:18 -0400

Thorvaldur Gunnlaugsson (thg@althingi.is) writes:
>There is lots of text around which could be made accessible
>on the web but nobody has the time to mark up.
>Frequently the only structure this text has is tabs and
>formfeeds. HTML should support formfeeds in <PLAINTEXT>
>so this little structure there is present in this text
>does not get lost on the web.
>The same applies to books which are scaned up, they have pages.

Unfortunately, PLAINTEXT is deprecated in HTML 2.0 and beyond, so you would
be forced to put the entire document into a number of PRE elements. As PRE
differs from PLAINTEXT in that phrase markup is not ignored, the content
would have to be translated into HTML compatible format (disallowed
characters translated to entities, eg. &lt;,&gt;, etc.).

It seems to me that you are attempting to take a paged format and make it
fit into a pageless metaphor. As has been discussed, there are very real
problems with taking a fixed page size (US Letter, US Legal, A4, etc.) and
fitting it to a 512 x 342 window, or onto a 24 row 40 character braille
output device. I don't know of any universal resolution to these problems,
philosophically or technically, other than translation into a pageless
metaphor such as HTML.

>[...]I need HTML to say something about pageing which I can expect
>the general browser that the public will be using to support.
>I dont need a special tag if the standard says that browsers should
>have as an option in printing a skipp to a new page on <DIV>
>or on <DIV CLASS=something> or something of that kind.

I would imagine your original documents are truly bound by by topic or
section, not by page. The problems I've seen in marking up existing
documents that are strictly paged stem from a) table of contents or
internal references to page numbers; b) pagination around graphics; and c)
footnote/endnote references.

I can see already the beginnings of a flood of responses to your original
post, so I'll cut this one off. I would (as other have and will) recommend
posting alternate text (with formfeeds), postscript or acrobat versions,
for those who would like to print/view the document in its original format.
Last week I added a feature to my HTML editor that breaks imports on either
formfeeds or custom characters, for the same reasons as you state. What I
found was that I had to go in with a text editor and move some of the
formfeeds to the beginnings of sections, as I didn't want a page metaphor
in many cases -- just too cumbersome.

>Is that cluttering up the standard?

Not cluttering really. You're proposing a concept that is part of a page
description language, which HTML is not. Section 1 of the HTML 2.0 DTD
describes the scope of HTML, which states that HTML documents are to be
portable from one platform to another. Capturing assumptions of display
width, height, font size, etc. are not part of the spec.

Murray

__________________________________________________________________
Murray M. Altheim, Information Systems Analyst
National Technology Transfer Center, Wheeling, West Virginia
email: murray.altheim@nttc.edu
www: http://ogopogo.nttc.edu/people/maltheim/maltheim.html