Once More on Anno

Terry Allen <terry@ora.com>
Message-id: <199308142101.AA29285@ora.com>
From: Terry Allen <terry@ora.com>
Date: Sat, 14 Aug 1993 14:01:32 PDT
X-Mailer: Mail User's Shell (7.2.0 10/31/90)
To: www-talk@nxoc01.cern.ch
Subject: Once More on Anno
Status: RO
In discussion off-line, Tony Sanders wrote:

There are lots of different levels of annotations so be sure
and spec talking about

private:    my own personal annoations that no one else sees, and
            dont involve any kind of server.  These are soley browser
            dependent and can play all kinds of games to stick to the
            right text.

workgroup:  These are for a tight group of people, probably doing
            collaborative authoring and editing.

public:     There are maintained by the server that owns the document.

global:     Probably will be distributed via USENET and subject to
            massive filtering to find only the items of intrest to you.

I think part of our problems have been everyone is talking about different
levels.  good luck.

That's such a good typology I'm sure Tony won't mind my citing it
in this discussion of how we should position annotations
(that is, use SGML or offsets or some combination of both?).  

In favor of using SGML elements:  they are flexible, at least
insofar as they won't break as soon as the text is edited.
They can be translated from one DTD to another, they can be
parsed in SGML along with the rest of the document, and 
they don't require some new method of reference.  I'd really
like to find a way to provide hooks in the SGML structure
for exterior references such as annotations.

In favor of using offsets:  they don't require
writing on the document, as does the SGML approach.

In either case we have a difficult problem of indentifying
the version of a document to which a given annotation 
refers, not to mention the intractable problem that people *will*
change their docs without revising their version numbers.
This came up in the discussion of URNs, and will recur, so
I'll leave it out here, noting only that we have a 
global problem of referring to versions of electronic documents,
especially as old versions may go away without trace.

Back to Tony's four cases (private, workgroup, public,
global).  In the first one, I am annotating somebody else's
document solely for my own use, like writing in the margins
of a book I own.  If I make a local copy of this document,
I can certainly use SGML elements to position annotations,
because I can write on my local copy.  

If I don't make a local copy and use offsets to position my 
annotations, I have no way of knowing when the document will change
and invalidate or otherwise screw up my offsets; the previous
version may not be available.  

I can cope with this by adopting various strategies according to 
how fast and in what ways I expect the annotated document to change; 
for example, if I thought the pieces of the document would stay in the 
same order while all growing larger, I might use a percentage 
offset (my annotation 1 is at 27.341% of the way through
the document).  If I thought the pieces would move around
I might use a string-matching method keyed to chapter titles.

Anyway, if I'm not sharing these annotations with anyone 
I can do them anyway I please, HTML+ or no.

In the second case, the workgroup, we are probably editing a joint
document, in which case we surely have a local copy and should
be using SGML to position annotations.  If we're using a remote
document we'd do well to make a local copy, but perhaps there
are three of us annotating a large corpus selected from 
HTML+-marked-up documents, we don't want to store them
locally, and we need to see each other's annotations.
On the other hand, if we don't want to store the files
locally they're likely to be larger than we can actually
read, so we may not need to have annotations popping up in
the margins, as long as we have access to the annotations

Marc Andreesen responded to my earlier remarks on annotations 
this way:

The way you (we) want to think of annotations is this: they're virtual
document modifications that may exist on any of several levels: fully
public but not physically attached to the document, fully public and
physically attached to the document [ecccch, I hope nobody actually
implements these], limited to a given subgroup of people, limited to
those who have expressed interest in the opinions of the annotator,
personal, etc.

This means the document author should not a priori be able to
"restrict" or "enhance" the capability of people to add annotations
that are not physically part of her document.  That is, you at
O'Reilly should not be able to affect how I at NCSA can add an
annotation to your document that can only be viewed by others at NCSA
-- and vice versa.  As long as the annotation isn't actually,
physically affecting your document, you shouldn't care what I do with
the annotation.  (Concomitantly, browsers should provide user
interfaces sufficiently well-designed to make it clear what's really
part of the document and what's an annotation added by some yoyo at

This is like all of us writing in the margins
of a shared copy; or like all of us writing in the margins
of identical copies and then collating the results; or,
in the case described by Marc, like all of us writing in the
margins of the library copy.  I can imagine using offsets 
here, but better and more flexible would be to position 
annotations by one of the traditional methods used for
heavily annotated texts.  For scripture one cites chapter
and verse; for Shakespeare one cites play, act, and scene;
for the classics one cites editions, works, parts, and line numbers.
We could maintain local files on these remote works we're 
annotating, using traditional methods of pointing, and 
be rather surer that our annotations would be positioned
well when we got around to reading them again.  And by
using established hooks, we automatically collate our
joint annotations.  

In the third case, in which annotations are maintained publicly by
the server that owns the document, the server can write
on the document and insert the appropriate SGML.  This is
like publishing one's own text with later commentary in the margins.

The fourth, global case ("Probably will be distributed via USENET and 
subject to massive filtering to find only the items of interest to you.")
has no parallel in our current practice of writing or publishing.
On my own behalf, and not speaking for O'Reilly, I think this is 
a real bad idea, importing into the rest of the Internet 
all the worst of the unmoderated
newsgroups (read soc.culture.arabic recently?).  I do like the 
idea of being able to apply the same filter to the Web that I
apply to news (and mail!), but I don't want other authors and
publishers writing all over my documents, virtually, of course.

Especially when someone starts a global anonymous
annotation service like anon.penet.fi, it will be impossible for
me as a writer and publisher to communicate directly with my
readers without being filtered through some third (and ... nth)
party or parties.  (Sure, readers can adjust their filters, those
who can figure out how to do it, but this reverses the terms that
presently exist between publisher and reader.)  Indeed, I might be 
heckled or spoofed or flamed---at hundreds of spots sprinkled through 
my document.  Unlike publishing a text with commentary (the third case),
this is like being forced to publish my text with any marginal 
grafitti anyone cares to put in, extending the margins if need be;
or like being forced to republish the text continually to include
*all* reviews of it.  Why should I put up with that?

Marc writes:
You'll always just be able to disregard/ignore the [annotations]
you don't want
to see, since they will practically never be physically attached to
the documents themselves and the act of fetching and displaying them
will always be a value-add your browser will perform only if you want
it too (I myself would probably never have my browser display any
fully public annotations, etc.)

But some people will.  And once the first global annotations
start piling up on the latest trendy social-commentary annotation
server, or the flame-your-colleagues annotation server for my
discipline, I will never have a straight shot at my reader.
This is real unattractive.

In Frederick Roeber's remarks on his work on public annotations via 
usenet, he says,

Fundamental in the system is a WAIS-like
(also usenet-interface-project-like) selection system,
where one can filter in/out annotations.

The natural news expiry system will expire annotations, too.
If an annotation is worth saving, I see two methods:
  1) the keeper of the original document likes the comment,
     saves it, and sticks in an actual reference, or
  2) (more sticky) if the annotation author really wants to
     say his annotation, then he stores it locally, and
     via some as-yet-unknown distributed server mechanism
     makes a pointer to it available.  This server could
     keep the annotation's URL, the URN of the thing 
     annotated, and maybe some meta-information.
I'm not really wild about the second point, but that's what
some discussions I've had with folks have ended up with.  I'd
suggest skipping it at the beginning, and seeing if it's
really needed.

I think that (1) is the only acceptable method (from the 
outset, not just after expiry of the annotation) if we
want to encourage people to publish on the Internet.
(2) is unacceptable---but it doesn't need offsets!

In the other cases of annotations everything hinges on having
a local copy of the document; if you do, you can annotated it
in SGML, but if you don't you have a choice between
mechanistic and fragile methods (offsets) or verbal and 
nonhypertexty ones.

Despite Marc's eloquent defense of offsets, I'd still like 
to encourage Dave to develop the annotation elements in
the HTML+ DTD to distinguish between the author's annotations
and other peoples's, encourage browser developers to make
element tags and their attributes (including IDs) usable
hooks for annotations, and encourage authors to provide
those hooks.  Our present method of reference is to 
files and SGML-named points in those files; we can probably
meet all our legitimate needs without importing an entirely
different mechanism.


Terry Allen  (terry@ora.com)
Editor, Digital Media Group
O'Reilly & Associates, Inc.
Sebastopol, Calif., 95472