Re: Filters for HTML?

timbl (Tim Berners-Lee)
Date: Thu, 1 Apr 93 01:45:54 MET DST
From: timbl (Tim Berners-Lee)
Message-id: <9303312345.AA22389@ nxoc01.cern.ch >
To: janssen@parc.xerox.com, www-talk@nxoc01.cern.ch
Subject: Re: Filters for HTML?
Cc: secret@dxcern.cern.ch
Yes, we have very crude filters for converting clean SGML to TeX
-- just 'sed' files.   They will take the output of the NextStep
WorldWideWeb.app becaus it puts line breaks in and so sed
can handle it.

If you want to make a converter which parses the HTML properly,
you could take the line mode client version 2.0, and
in the library just hack the HTML regeneration module
HTMLGen.{c,h} to produce TeX instead of HTML. The module
is driven by a stream of text and element stop/start by
element number, so it is just a set of tables of strings.

If you are interested in our mapping, ask Arthur Secret
<secret@dxcern.cern.ch> to mail you our latest sed files.
We in fact made one new latex macro  for the paper docs
we push out, in order to do a better job of DL lists.

The basic sed files for making article style latex are
on the web ... look under "tools for information providers".

Tim
	From janssen@parc.xerox.com Thu Apr  1 00:56:53 1993
	Return-Path: <janssen@parc.xerox.com>
	Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
		id AA21635; Thu, 1 Apr 93 00:56:39 MET DST
	Received: from alpha.Xerox.COM by dxmint.cern.ch (5.65/DEC-Ultrix/4.3)
		id AA21261; Thu, 1 Apr 1993 01:15:29 +0200
	Received: from holmes.parc.xerox.com ([13.1.100.162]) by alpha.xerox.com with SMTP id <11942>; Wed, 31 Mar 1993 15:15:08 PST
	Received: by holmes.parc.xerox.com id <16134>; Wed, 31 Mar 1993 15:15:00 -0800
	Received: from Messages.7.15.N.CUILIB.3.45.SNAP.NOT.LINKED.holmes.parc.xerox.com.sun4.41
	          via MS.5.6.holmes.parc.xerox.com.sun4_41;
	          Wed, 31 Mar 1993 15:14:54 -0800 (PST)
	Message-Id: <ofiWLioB0KGWFC3=Zz@holmes.parc.xerox.com>
	Date: 	Wed, 31 Mar 1993 15:14:54 PST
	Sender: Bill Janssen <janssen@parc.xerox.com>
	From: Bill Janssen <janssen@parc.xerox.com>
	To: www-talk@nxoc01.cern.ch
	Subject: Filters for HTML?
	In-Reply-To: 
	References: 
	Status: O

	Does anyone have filters that will convert HTML to TeX?  Or TROFF?  Or
	PostScript?  or anything...

	Bill