Re: Change bars

weber@eitech.com (Jay C. Weber)
Date: Fri, 6 Aug 93 10:23:57 PDT
From: weber@eitech.com (Jay C. Weber)
Message-id: <9308061723.AA01068@eitech.com>
To: tom@begbick.law.cornell.edu
Subject: Re: Change bars
Cc: www-talk@nxoc01.cern.ch
Status: RO

Tom & company:

Before we go much further in defining change bars as an HTML idiom,
let's consider making it a meta issue, i.e., literally the differences
between two different versions of a document.  Otherwise, documents
will start getting real messy!  We've talked about doing some real
version control, let's get to it.

Here's an idea that I think would help browsers derive change "bars":
we define an HTML Diff format that concisely describes the deltas
between two HTML sources.  Before getting into what such a format could
be, here are some other users apart from change bars:

 o a server can use HTML-Diffs to efficiently manage multiple versions
   of a document, both in space and network bandwidth, as in a good
   revision control system;

 o an HTML-Diff format can encode arbitrary annotations -- the current
   style of appending links would be a special case;

 o HTML-Diffs could encode the modifications a user makes to a document
   when "filling in a form", e.g., as modifications to the value
   parameter of certain HTML+ INPUT tags.

Here are my ideas on the format.  We could use something like UNIX diff(1)
format, except that it is very whitespace sensitive and we could end up
with two documents that render identically but have a large number of
UNIX-style diffs.  Seems to me that HTML-Diff should be specific to
SGML, so features like tag-parameter-order-independence are factored
out of diffs.  So we need a notion of a canonical SGML/HTML representation.

I don't know the data structures very well, but libWWW's Structured
Streams could be a good place to start for such a canonical representation.
Presumably SS's are series' of text blocks and tag structures, and the
Diff operations can be standards text editing operations on text blocks
and structure edits on tags.

One tricky issue is how to refer to objects in a SS; I suppose we could
refer to tag ids when possible but otherwise by ordinal position in the
SS.  This would make use of the format somewhat brittle (tricky hand
creation of Diffs!), and would work best when transparent to all users.

Finally, I'll suggest making the format valid SGML.  Maybe that's
controversial, but it seems to me to be a net win.

So in summary, some HTML-Diffs might look like this (off the top of
my head):

<!-- substitute "bar" for "foo" in the fifth structured stream object
     (a text object) >
<DELTA edit ssobj=5 type=text operation="s/foo/bar/">

<!-- add a new anchor to the end of the document >
<DELTA insert location=end data="<a href="http://eitech.com/>EIT</a>">

<!-- an entry into a form as a INPUT tag parameter change>
<DELTA edit idref=blank1 type=tag parameter=value data="Jay Weber">

Perhaps some committee has done this sort of thing for SGML?

Jay
--
Jay C. Weber					weber@eitech.com
Enterprise Integration Technologies		weber@cis.stanford.edu
459 Hamilton Avenue				(415)617-8002
Palo Alto, CA 94301				fax: (415)617-8019