SOAP Requirements

Jared Rhine <jared@osiris.ac.hmc.edu>
Errors-To: listmaster@www0.cern.ch
Date: Tue, 22 Mar 1994 15:43:46 --100
Message-id: <199403221440.GAA19118@osiris.ac.hmc.edu>
Errors-To: listmaster@www0.cern.ch
Reply-To: jared@osiris.ac.hmc.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Jared Rhine <jared@osiris.ac.hmc.edu>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: SOAP Requirements
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 15678
Although this message in Interpedia-specific, there are numerous HTTP
questions contained herein that I would like some feedback on.

I apologize if the length of this message is a burden on your mail system,
but one message is probably computationally cheaper than dozens of little
messages.

What information do we need to encode into a SOAP? (The Interpedia FAQ
answer on "What are SOAPs?" is attached at the bottom) I'm looking for
suggestions to nail down a design specification and possible encoding.

Suggestions so far:

  1. Digital signature

     This is necessary to ensure that a given SOAP has not been forged.  In
     some sense, given the model I propose below, it seems that this boils
     down to trusting the transaction in general.  If the server is trusted,
     then one can assume that it is providing the correct list of SOAPs
     associated with the document.

     Recent discussions on www-talk have allowed some people to get a
     slightly better idea of the issues involved with client/server
     authentication and how they relate to anonymous information systems.
     Those ideas should be able to make it possible to (fairly) reliably
     authenticate the server.  This will allow the organization to submit
     their SOAP to the Resource-Location Service (RLS) in an authenticated
     manner and also to allow servers to exchange that information with some
     degree of trust.

     Given these two things, one should be able to be assured that any SOAPs
     the server claims are attached to a given document are indeed
     authentic.

     Is such a mechanism sufficient?

  2. Organization identification

     We need to uniquely identify the organization issuing a given SOAP.
     Thus, we need to provide a flat namespace for organizations.  I assume
     similiar issues have been addressed by the URI/URN working groups; can
     anyone provide specifications on what they have come up with?

     I would recommend that this namespace be heirarchical in nature.
     Potentially, every person on the planet will be able to issue multiple
     SOAPs, so the namespace scheme _must_ be designed with that in mind.
     Granted, such flexibility may not be necessary in the near future (nor
     likely ever), but recent suggestions that we limit the SOAP approval
     domain value to "something simple" like 1 to 10 require me to insist
     that we think in terms of future expansion.

  3. SOAP name

     Given a heirarchical flat namespace for identifying an organization, it
     makes sense to me to simply expand the domain to include the name of
     the SOAP as well.  In other words, a single heirarchical namespace
     would identify both the organization and the specific SOAP.  Is there
     any reason not to do it this way?

  4. Definition of range value types

     Is it sufficient that we define the range value to be limited to (for
     example) 16-bit signed integers?  Or would some SOAPs work better if
     the range value could include an enumerated type?

     This is a tough issue since restricting the range to numeric values (of
     any type; reals could work, too) would simplify implementation issues.
     Any introduction of a string-based enumerated type would require that
     the clients and servers agree on the type values somehow.  This can be
     done either with some sort of negotiation procedures and some way to
     convey the semantics associated with the type (yuck), or by defining
     the types via an ICC standard.

     I propose that we categorize SOAPs into two types: numeric types and
     enumerated types (better word, anyone?).

     Numeric types should be based on 16-bit signed integers and be used to
     specify should degree of approval.  Negative values would signify
     disapproval, as expected.

     Enumerated types are SOAPs for which the range has some predefined
     value, based upon Interpedia standards.  This could be extended to
     encompass SOAPs that have meaning based solely on their existance (or
     in other words, the range has only one value).

     Although enumerated types are more difficult to manage, I think they
     are useful and should be included in the specification.

     Of course, since signed integers is really an enumerated type, I'm
     promptly dropping the distinction :)

  5. Encoding scheme

     I'd highly recommend an arbitrary length alphanumeric string for the
     SOAP domain.  I really don't feel like dealing with arbitrary data in
     my browser and client implementations.  It doesn't make a whole lot of
     sense, anyway, to say, "Give me SOAP ^&#*$&".

     To facilitate heirarchical spaces, one scheme that I've used in a
     number of databases I've implemented is to use the exclamation point as
     a path separator.  Is there a better way to do this?  I've also use
     '!:!' to make the separator more unique, but this is the same general
     idea.  As far as I know, at least one character must be reserved, since
     the path separator is basically "out-of-band" data.  Are there any
     other suggestions?  I also note that if there are multiple "parts" to
     the SOAP (if we encode more than just the heirarchical organization
     name), we need another separator.  I usually use the double-colon,
     '::', but since the entire key should be RFC822 compliant, that is out.
     For now, I'll use the hash mark, although I am loath to remove more
     characters from the domain than necessary.

     Hmmm, it occurs to me that similiar work has been done for encoding
     URNs.  The March 26th version of the Sollins draft, "Specification of
     Uniform Resource Names", includes no physical representation of URNs.
     I don't have time right now to go bouncing all over the Internet
     looking for the most recent one (I think the encoding type has been
     dropped since the last copy I have).  Pointer, anyone?

     Hmmm, I seem to remember there being colons in the URN spec; does that
     mean that URNs can't be transmitted as HTTP object headers, but
     only as request URIs?

     Note that the concept of "alphanumeric" could potentially be extended
     as extended character sets become more popular (has anyone thought of
     this for HTTP request/object headers?).  For now (at the prototype
     stage), I recommend specifying a restricted ASCII (see RFC822, section
     3.1.2 for more details).

  6. Mechanism for SOAP inheritance

     Beats me.  I'd rather put this off until I have at least bare-minimum
     SOAPs implemented.  Whatever inheritance heirarchy is set-up, it should
     probably be maintained separate from the SOAPs themselves.  As such,
     the above scheme for naming SOAPs doesn't appear to impend later
     implementation of SOAP inheritance.

A note on numeric SOAPs: different organizations will use the numeric SOAP
range in different ways.  IMHO, this can not be prevented.  Different people
rate things in different ways; how are you ever going to get around that?
People who utilize a given SOAP will quickly find out what number
constitutes a good rating for a given SOAP.  To facilitate mutli-SOAP
searches ("prune this search by only selecting articles for which these
numeric SOAPs are over 30000"), the client could take care of scaling the
SOAPs.  You enter the SOAP value as 30000, but the client knows that a
certain SOAP is known to use values that are a little too low, so it should
actually request a value of 20000 for that specific SOAP.  The client has to
have some user-level mechanism for weighting SOAPs, anyway; this action
would just be a part of that.

It seems likely that the metainformation for the SOAP should be kept
external to the document itself.  Assuming use of the HTTP protocol as the
transfer method (as my prototypes do), I think the HEAD information should
probably be generated from information external to the document.  Although
this introduces some problems with respect to keeping all the documention
information "in one place", I think the difficulties associated with
hard-coding the information in the document are far greater.

Given that, it seems that a good location for the SOAPs would be by using
the META tag of HTML+.  As such, the entire line must conform to RFC822 so
that the HTTP protcol can transfer it as header field (key: value).

Thus, roughly, for an enumerated SOAP, the HEAD of a document would contain

  <meta name="x-SOAP#American_Children's_Association!suitable-for-children"#
	value="boolean#true">

or, other example of an enumerated type,

  <meta name="x-SOAP#American_Children's_Association!reading-level"
	value="reading-level#8">

  (enumerated type signifying reading grade-level)

or, a numeric type

  <meta name="x-SOAP#American_Children's_Association" value="numeric#35000">

So I seem to have answered my original question; the minimum information
that needs to be encoded into a SOAP is the type of the range, and the
Organization/SOAP name, specified in some heirarchical manner.  Assuming
some key/content model (motivated by its natural application to database
implementation), all the information must be encoded into either the key or
the content.  It doesn't matter a great deal where the division comes (one
could indicate the range type in the key portion above.

As you can see, there are numerous ways to take care of these details; what
I'd like to do is nail down a preliminary specification, subject to change,
so I can get on with finishing my prototypes.  To save myself grief, I'd
like to refrain from implementing anything until I get at least a quorum
vote on the matter (quorum meaning people who care).  Hopefully, that quorum
won't be one person (me!).

On other track, what should be the mechanism by which a SOAP enters into the
SOAP-space?  To a great extent, this cannot even begin to be answered until
we have some sort of idea of what the protocols and mechanisms of the
underlying distributed service are.  The best system I currently know of for
distributed services is DNS, but that is inadequate to our needs (despite
recent discussions prompted by Martin Hamilton about how to implement a
URN->URL resolution scheme using DNS).  I imagine that the Xanadu projects
have put a great deal of thought into this area, but I frankly admit that I
am loath to pursue any technology that we would be required to license.  At
this point, it seems likely that my senior research project will focus on
exactly this issue of distributed databases, and specifically their uses to
distributed information systems.  (Speaking of which, can anyone offer
references to literature on the subject of distributed databases?  There
seems to be a dearth of literature which doesn't focus on the issue of
keeping the database consistent; something important to business community,
but distinctly less important to information systems).  So, in a little over
a year, we'll have at least one backend for all of this, but I'd like to get
work done far before that.

======================================================================
Subject: 4.2 What are Seals of Approval (SOAP)?

The concept of a seal-of-approval (SOAP), introduced by Erik Seielstad,
is currently being actively discussed.  SOAPs have achieved some
prominence, and have subsequently been referred to in several
comp.info.xxx newsgroups.  A new notion is that SOAPS could be
hierarchical, in that a SOAP could indicate approval or disapproval of
a group of other SOAPs.  Another is that a SOAP could point to two
types of articles, ones that agree with and ones that disagree with
the article to which the SOAP applies.  The subject of links is also
being discussed and is running a separate but parallel course.
A principle difficulty seems to be in deciding how to implement SOAPs.
Below, are three articles that describe SOAPS.

Doug Wilson <dwilson@crc.sd68.nanaimo.bc.ca> wrote:
A seal-of-approval is data provided by a person or persons which
indicates that some article is good.  (Seals of disapproval have
also been proposed.)  Seals-of-approval will be used by people in
deciding what articles to read, but will also be used by the
Interpedia software to decide which articles to make most easily
available to people, according to their stated preferences.
If you set a user-parameter indicating you only want articles which
have the Jeff-Foust-Quality-Assurance-Board-Seal-Of-Approval,
then only those articles will be set up for convenient (default)
access -- although all other articles will still be accessible,
with a bit more effort.
(Doug Wilson)

Jeff Foust <jfoust@mit.edu> wrote:
Seals-of-approval have been suggested as a way to provide editorial
input on articles submitted to the Interpedia without subjecting all
Interpedia users to the editorial opinions of a few.  In short, any
user would be able to create a "seal" that could be affixed to
articles that the user found to be factually correct, well written, or
ideologically agreeable to him/her.  There would be no limit on the
number of seals that could exist.  There would likely be a directory
of seals kept, so that users could refer to the directory to determine
who the authors of a particular seal are and also obtain some basic
information on it (e.g., what classes of articles are typically given
this seal, what criteria the authors use to assign seals, etc.)
(Jeff Foust)

Jared Rhine wrote <jared_rhine@hmc.edu> wrote:
The concept of SOAPs was invented to help solve the problem of balancing
editorial issues, academic freedom, and database viewpoints.  The
problem is, how can one construct the Interpedia in such a way that
anyone can contribute, and yet allow the user to retain a focused view
of the articles available?  If the Interpedia is to scale well as it
grows, it would (ironically) not be acceptable to have the number of
articles returned in response to a particular query to grow at the
same rate.
SOAPs are issued by any organization (or individual) that wishes to
produce one.  Any given document can (and will) have multiple SOAPs.
Each SOAP represents a rating of that particular document by a
particular organization.  In general, there would be a number of
institutions whose opinion you respect.  If that organization rated a
particular document highly, it is likely that you would consider it a
valuable document, too.  Articles containing a SOAP from those
organizations would be included in your view of the Interpedia.
Note that SOAPs can provide arbitrary slices of the dataspace of the
Interpedia.  Any given search on the entire space of the Interpedia
would return a large number of documents.  From that set of documents,
you _apply_ a SOAP, which based on some criteria, selects a subset of
those documents.  As you apply more and more SOAPs, the set of documents
becomes smaller and smaller.  You could conceivably also perform set-
theoretic operations with SOAPs, ie unions, intersections, and so forth.
Some examples of SOAPS I have envisioned based on discussions from
the list:
   * A SOAP from the American's Children Association.  A particular
     browser could be configured to return only those documents which
     are suitable for children.
   * Peer review journals based solely on the Interpedia.  a technical
     article could be published by anyone;  how do you know it isn't
     completely invalid?  Because the IEE has certified this document
     as having technical merit.
   * A SOAP for poets who insist that every document be written in
     iambic pentameter.
   * SOAPs could also have numeric ratings associated with them; a
     particular article might have a readability index of 78.