Re: RTF translator... How long??

Dan Connolly <connolly@pixel.convex.com>
Message-id: <9301291830.AA14018@pixel.convex.com>
To: "Peter Lister,
    Cranfield Computer Centre" <ccprl@xdm001.ccc.cranfield.ac.uk>
Cc: www-talk@nxoc01.cern.ch
Subject: Re: RTF translator... How long?? 
In-reply-to: Your message of "Fri, 29 Jan 93 16:53:39 GMT."
             <9301291653.AA19849@xdm039> 
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="cut=here"
Date: Fri, 29 Jan 93 12:30:22 CST
From: Dan Connolly <connolly@pixel.convex.com>

--cut-here

>I see placeholders in the httpd code for RTF, and documentation on the
>web saying that this is being considered, but how long is it likely to
>be before I can feed RTF to the web? Is anyone actively working on
>this?

I have been, but I don't expect to be able to do much more work
on it.

> If so, would you like an alpha/beta test site? If not, where can
>I get an RTF format definition, so that I can have a go myself?

I wrote a Motif widget for displaying RTF stuff -- it probably
doesn't directly support the hypertext conventions you used in
your RTF documents, but you should be able to fix it somehow.

Most of the RTF parsing code was written by Paul DuBois. Ask
archie about the RTFstuff package.

Here's the source distribution for my widget:

--cut-here
Content-Type: message/external-body;
	access-type=ANON-FTP;
	site="export.lcs.mit.edu";
	directory="contrib";
	name="XcRichText-1.5.tar.Z";

Content-Type: application/x-tarZ; name="XcRichText-1.5.tar.Z"

--cut-here

As for RTF and WWW:

I'm working on an architecture for dealing with HTML documents
in various formats. There's a base HMDoc class, whose essential
methods are startTag, data, and endTag. I've written two parsers:

SGML_parseInstance -- parses an SGML instance, and calls the HMDoc's
	startTag, data, and endTag methods as the info is parsed.
Plaintext_parse -- parses plain text files. Calls the HMDoc's
	startTag method to start a PRE element, then calls the data
	method repeatedly to load the data, then calls the endTag
	method.

And I'ver written three HMDoc subclasses:

InCore -- builds a data structure to store all the information from
	the tags and data. You can call it's traverse method, and
	it will traverse the data structure and pass all the tags
	and data to another HMDoc.
HTMLwriter -- responds to startTag, endTag methods by writing SGML
	markup to a stream. Responds to data method by writing it
	to a stream, representing <, >, and & as character references.
MIFwriter -- responds to methods by writing a MIF representation
	of the document to a stream.

So, for example, you can use SGML_parseInstance to read an HTML
document into an InCore HMDoc, then use the InCore traverse method
to pass the data to a MIFwriter to write it in MIF format.
Or you could use Plaintext_parse and give it a MIFwriter, and convert
straight from plain text to MIF witout ever storing the document
in memory.

These modules are designed to drop into existing clients. For
example, the xmosaic client has a Save As... dialog box with
several format options. All you need to do is map the
format options to HMDoc subclasses. Then traverse the document,
calling the HMDoc's tag and data methods, and it will write
the document in the appropriate format.

The GridText module from the linemode browser could be turned
into another HMDoc subclass.

Also, it should be easy to write LaTeXwriter or RTFwriter subclasses.

To support reading RTF documents, I'd start with Paul Dubois'
RTF parsing stuff, and make it into a routine like SGML_parseInstance.
The routine would have to recognize the RTF equivalent of start
tags, end tags, and data.

If your RTF files have paragraph tags that match the HTML tags,
it's easy. If, on the other hand, your RTF files were written
without HTML in mind, you've got a more difficult task on your
hand.

You have two options: 1) try to zen some HTML structure out of the
info in the RTF file, or 2) treat the RTF file essentially as
a graphic, and use something like my RTF widget to display it.

Option 2 doesn't allow you to integrate hypertext in your RTF
documents with WWW, but it may be the easiest way to go.


Dan

--cut-here--