PATHs in HTML

redback!jimmc@eskimo.com (Jim McBeath)

Mail folder: WWW Talk Jan 94-present
Next message: Charles Henrich: "Re: CGI and typing files by suffix"
Previous message: Alexsander Totic: "Re: Whitespace"
Reply: Bert Bos: "Re: PATHs in HTML"

Date: Mon, 10 Jan 94 12:22:32 PST
From: redback!jimmc@eskimo.com (Jim McBeath)
Message-id: <9401102022.AA00994@redback.>
To: www-talk@www0.cern.ch
Subject: PATHs in HTML
Content-Length: 7584

Dave Raggett and I have been having an email discussion of PATHs in HTML.
What follows is the results of that discussion, which I present as
a proposal for an enhancement to HTML (and WWW clients).

A PATH is a way of defining a linearization of a set of nodes.
The user of a client can perform operations on the nodes in a path, such
as traversing (browsing) them in order, searching, or printing.

The requirements for paths:
1. A path must be able to include nodes for which the user creating the path
   has only read access; therefore, it must be possible to define a path
   which includes nodes in which no path information resides, including
   non-HTML nodes.
2. A single node may be included in multiple paths; therefore paths must
   be given some kind of name to distinguish them.
3. It must be possible to incrementally define paths; therefore a path
   must be able to include another path.
4. It must be possible for a user with no special privileges to be able to
   create his own PATHs which reference any documents (for example, it can't
   require modifying HTTP server files).
5. The mechanism for creating PATHs must be reasonably simple to understand.

This proposed implementation requires the use of REL=Subdocument (already
proposed in the current version of the HTML+ spec at
 ftp://15.254.100.100/pub/draft-raggett-www-html-00.ps )
plus one additional enhancemnt:

	Add "Path" to the set of legal REL values.

Given these two REL values for anchors, a path is defined as follows:
1. A path is defined by anchors in an HTML node.  All anchors which
   have REL=Subdocument or REL=Path are included in the path definition.
2. A single HTML node contains a single path definition.
3. An anchor with REL=Subdocument causes the referenced document to be
   included in the path.
4. An anchor with REL=Path causes the referenced document to be interpreted
   as a path, and all members of that path are included in the top level path.
   Nesting is allowed: the included path can also include other paths.
5. A path node containing anchors with REL=Path can be treated either as a
   single path containing subpaths, or as a collection of independent named
   paths.
5. When interpreting a path node as a collection of independent paths,
   the anchor string is used as the user-visible name of each path.

An example of an HTML node which defines a single path:
        <H1>my path</H1>
        This is a path of things I've found useful.
        <DL>
          <DT> <A HREF="node1" REL="Subdocument">label1</A>
          <DD> a summary
 
          <DT> <A HREF="path2" REL="Path">label2</A>
          <DD> a summary
 
          <DT> <A HREF="node3" REL="Subdocument">label3</A>
          <DD> a summary
        </DL>
This path includes node1, followed by all of the nodes defined in path2,
followed by node3.

An example of an HTML node which points to independent paths:
       <H1>useful paths</H1>
       These are some paths I've found useful.
       <DL>
         <DT> <A HREF="alphapath.html" REL="Path">Alphabetical</A>
         <DD> nodes of interest sorted alphabetically
 
         <DT> <A HREF="geopath.html" REL="Path">Geographical</A>
         <DD> sorted by location
 
         <DT> <A HREF="chronopath.html" REL="Path>Chronological</A>
         <DD> sorted by time
       </DL>
This node points to three separate paths.

Each path operation could come in two versions:
1. Treat the current node as a single path.
2. Treat the current node as a collection of pointers to separate (named) paths.

Client programs could add the following commands that deal with paths:
1. Traverse current path
2. Traverse named path
3. Find in current path
4. Find in named path
5. Print current path
6. Print named path

"Traverse current path" would collect the list of nodes in the path
definition from the current node, then switch to viewing the first node
in that path.  This would be appropriate, for example, for a path node
which was also a nicely formatted Table of Contents page.

"Traverse named path" would collect the list of anchor strings from all
anchors with REL=Path in the current node, allow the user to
select one of those, then retrieve that (path defining) node and switch
to viewing the first node defined by it.  Note that in this approach the user
never actually views the HTML node which defines the path selected.

"PathNext" and "PathPrevious" buttons would facilitate stepping forwards
and backwards through the nodes of the currently active path.

Conceptually, at the moment the user selected the Traverse command (or any
other path command), the client program would build the entire list of
nodes for that path (although in practice this could easily be done
incrementally), so that the PathNext and PathPrevious commands are
unambiguous, even when a node is a member of multiple paths, or is itself
a path definition included as REL=Subdocument.

The user could switch to another path at any time; but note that all
paths are defined by some particular node: there is no way to select
a new path from within a node except for a path that is defined in
or referenced by that node (except for paths that the client already
knows about).  This restriction is similar to the fact that you can only
jump to other nodes that are pointed to from the current node (except for
nodes that the client already knows about, e.g. in window history).

"Find in current path" would collect the path definition from the current
node, then search through each node in the path in sequence, stopping when
it found the string asked for by the user.

"Find in named path" would let the user select one of the paths referenced
by the current node (just like in "Traverse named path"), then search
through the nodes defined by that path.

"Print current path" and "Print named path" would have the same options
as the current command to print a single node (or save it to a file),
but would print (or save) all of the nodes in the selected path.

Given this capability, I could define a path to print out my entire
Users Manual, and (just as important) to allow me to search through
the contents of the entire manual with a single command.
I could then define other, specialized paths that went through different
pieces of the manual in another order, for people who wanted to learn
about a particular subject.

I would also define a set of paths in my personal home page, so that I
could search through multiple documents more easily:
    HTML - would include pointers to a bunch of the HTML documents
	(including HTTP, HTML primer, HTML+ CGP, etc).
    Indexes - would include pointers to the indexes I most frequently
	search through when looking for something that I know I've seen
	before, but can't remember which index it was in.
    Personal - all of my personal pages, so that I can split up my
	huge home page into separate pages and still be able to easily
	search through them all.

This scheme is easy to understand, easy to define, and easy to implement,
entirely in the client.  Clients which did not understand REL=Subdocument
and REL=Path would just ignore those attributes; the user could still use
the path node to get to the nodes in the path by selecting an anchor.

It allows people to create new paths that build on paths that others
have defined without requiring modification of all the nodes in the path.
It would give document providers a way to define a set of nodes to be
printed into a manual, and would allow users to define their own custom
manuals, either for printing or for searching.

-Jim McBeath
jimmc@eskimo.com