Re: REL and REV attributes (Was: More comments on HTML 3.0)

Craig Hubley (craig@passport.ca)
Sat, 29 Apr 95 15:56:58 EDT

Dan writes:
> This is very much the crux of the matter: are we defining a global
> formal semantics, or just a global syntactic mechanism upon which
> folks can build application-specific formal semantics?

I would have said the latter, but substituted 'machine-interpretable' or
'implementation-defining' for 'formal' as there are varying degrees of
formality. Someone once demonstrated that, for all its pretension to
formality, only one of the basic lambda calculus operations could be said
to be truly formal. I'd rather avoid the word 'formal' in this context,
as once again we are specifying *meaning* to be acted on by humans or by
*programs* written by humans... either way invoking an informal
interpretation mechanism on the 'formal' semantics. We should concentrate
on clarifying the implications of this interpretation process (as with the
SHORTTAG issue) rather than trying to increase the level of formality.

That said, I agree with Dan's comments, specifically:

1. typed links are a form of knowledge representation.

2. Predicate calculus is pretty expressive, and it's a formal system.

3. No one formal system is good enough for all applications.

4. Human knowledge is not consistent or decidable, but there are subfields
of human endeavour where things are more consistent than others (e.g.
legal precedent) or at least where consistency and decidability matter.
In some areas it is the whole point (e.g. security and access control).

At this point we run into what seems like a point of definition:

> So I'm very much against the idea of registration and use of these
> words in the sense of a programming language. I do like Dave Raggett's
> idea of having a sort of "standard library" of these relationships,
> and reserving some part of the namespace for future standardization.

But this is *exactly* like a programming language. Only a small subset
of the possible/useful words are standardized, as close to 'none at all'
as practical. This is the 'standard library'. Applications tend to
build on the standard library in their own idiosyncratic ways, finding
common ground where they use common resources (e.g. a graphic UI, SQL DB,
etc.) and there achieving some standardization. However there is usually
no need, nor movement, to achieve global consensus on types/relationships
specific to an application. From time to time there will be moves within
an industry to standardize within an industry, but then it's for interchange
not 'internal consistency'.

Within the application what matters is not standardization but capability.
Consider that the core of any programming language is a facility to
differentiate words (parsing) and assign execution semantics to these
(binding). Object-oriented languages are probably the most successful
at minimizing the number of standard types, and they do so by providing
robust facilities for building your own. This is an absolute necessity.
However, languages that totally neglect building a strong standard library
in favor of a huge standard library (e.g. Eiffel) tended not to be adopted.
Therefore I'm in favor of having a large library of well-known link types,
(or I wouldn't have volunteered to help write it).

Most programmers, like authors, want to find the 'standard' solution and
use it. A few more will want to extend or replace the 'standard' solution or
invent one small thing that makes a particular application a little easier.
Only a small percentage will ever get involved in creating their own robust,
industrial-strength data types, or link types, for use by the whole world.

Let's not force HTML authors to 'register and approve' their innovations
before use...! As with unrecognized tags, unrecognized link types should
just be shown by name and left unprocessed. Ideally I think we might treat
the processing of a link very similarly to the processing of a MIME type:
individuals (or their proxies) configure which programs interpret which
links... the difference being that these might change with every document,
author and reader share some responsibility for the semantics, and each has
two arguments to the processing: its own type, and the document 'linked to'.

Right now authors can specify cgi-bin scripts to assign semantics to links,
and readers can specify MIME type viewers to interpret complex data... this
would be halfway between: authors and readers would share responsibility in
some way for assigning meaning to links. So perhaps all we really need is a
simple naming mechanism for links, and a browser facility for invoking client-
side scripts, and somehow sharing control of the whole thing amongst reader,
author, and software implementor. I suggest we think of this as a MIME type
that takes arguments... is this reasonable ? The rest of the infrastructure
required (e.g. invocation of client-side programs approved/installed by the
reader, visible signals to users of what the data type is or what capability
will be invoked, etc.) is so similar that there is no point re-inventing it.

If someone prefers to see 'footnotes' in a small side window alongside the
main text, rather than shifting to another browser window, that's reasonable.
They should be able to assign a program to link type 'footnote' that does it.
Keep in mind that most links will not have *any* processing assigned to them,
and many will continue not to have types at all. Link types will be just
part of the rhetoric, indicating that another document is a 'source' or that
it 'contradicts' the point made, or that it provides a 'precedent', etc....

This falls a little short of specifying an abstract link model for hypertext,
but given the history of such models we shouldn't rush in too quickly.

SGML DTDs provide a good example of industry co-operation to standardize
information interchange. Although it is useful to standardize a DTD for
an entire industry worldwide, the costs in some cases outweigh the benefits.

However, none of this stops someone from hacking up a quick DTD for a little
publishing job. We need a solution that will *let* authors build complex
custom semantics without *forcing* them to.

> Then there's the question: do we want _any_ global semantics? For...
> (example omitted)
> ...This seems like it would be useful for back-link services and such.

My question back is : 'can we avoid them ?' Even pure object-oriented
languages can't.

> I'd prefer to replace rev by notation in the rel attribute, so
> that in stead of rel="next"/rev="next", we'd have rel="next"/rel="next-inv"
> or something like that.

I agree with this. REV doesn't seem to have much use. Maybe we can sneakily
redefine it as something to do with REVisions, which would definitely be
useful.

> At lunch today, TimBL suggested that rel values should be interpreted
> as relative URIs with an as-yet-unspecified base. So they're
> first-class objects, just like the terms they relate. We can specify

Absolutely. And there is already a means of embedding complex first class
objects in HTML... although not one that expects a browser to move to another
URI in showing them... which a 'link' would.

> at some point in the future a way to dereference them, for example,
> to download a specification of them.

Exactly the point above. That specification could be, for instance,
a configuration file for a program that is assigned the job of interpreting
a cetain set of links, or even the URIs of programs to interpret each one.

> John Mallory's work on knowledge representation (used to answer the
> whitehouse mail, e.g.) builds great big worlds of understanding around
> this nucleus of binary relationships where the relationships are
> first-class objects.

References ? If this is already being used in high-volume automatic
free-text information processing, I think we need to see it.

> At the extremely theoretical end of the spectrum, I guess the lambda
> calculus shows all you need is one operator and one argument:
> (lambda args expr).

Careful. You're already getting informal...!

> At a very practical end of the spectrum, the relational calculus (the
> theory behind Sybase, Oracle, etc.) argues for n-ary predicates. I've
> seen a sketch of an argument that says that HyTime linking boils down
> to the relational calculus, so that's another argument in favor of
> n-ary predicates.

I would like to see this argument restated before we go too much further.

Somewhere between HyTime links and MIME types I expect an obvious answer
to fall out that lets us move in either direction.

> * we don't need rel _and_ rev, though the cost of un-standardizing
> REV should be considered.

Agreed. Any objections to changing the semantics of REV?

> * the list of values should act like a human language vocabulary,
> with a somewhat organic evolution, rather than a programming
> language keyword list. But: keep in mind that we may want
> a URI style mechanism of dereferencing these terms.

Proposed above.

> * Don't expect the whole web to be consistent, but allow
> applications that rely on consistency within some adminstrative
> domain

I *do not believe* that there is *any* relationship between the document
semantics and administrative domains in the internet sense. If you want
to accept someone who wants to read SCO online docs 'into their domain'
for purposes of reading their docs and then immediately turf them out
as soon as they hit the last page, I am 100% in favor of such a facility.
However I would think of this more as a 'session service' as cgi-bin
scripts etc. seem to provide, and less as something that must be pre-
arranged.

> * Make sure we at least solve the "print this tree of html documents"
> problem.

Easy enough. But please stop calling it a 'tree', because it isn't, even in
simple applications. It's a directed graph where cycles are allowed... a 'web'.

> * Keep an eye out for applications of typed links (and other
> markup, like class and meta) to represent conventional bodies of
> knowledge: schedules (gant/pert charts), org charts, USENET
> threads, case law and precedents, library/journal category
> hierarchies, check registers, address books, bibliographies, ...

We'll list these in our RFC.

> * Leave the unsolved problems unsolved: structured argumentation
> systems, proof checking, contraint-based reasoning, ...
> But allow these applications to be built. Hmmm... I wonder

Absolutely.

> if these applications should be built on HTML typed links
> and class/meta tags, or if they should just be new SGML document
> types, or other document formats altogether. Time will tell.

Hard to say. I think that I have changed my mind about something; that is,
I don't want to have HTML specify its own abstract link model, but allow it
to support arbitrary models if that is possible. I don't want HTML to be
'just another hypertext system' but rather the presentation interface to a
variety of hypertext systems, each with possibly wildly different semantics.
A base for experimentation, like the net itself, where 'standard' applications
are universally supported but the mechanisms are left open to do anything.

> Let's leave the option of doing these in HTML, if it doesn't
> cost us too much.

Cutting off development of hypertext semantics more complex to represent in
a markup language is a very high price. I wouldn't mind if folks developed
complex interpretation semantics on names of links, but HTML should not be
aware of this. That's why we agreed to separate this into its own RFC, right?

> Daniel W. Connolly "We believe in the interconnectedness of all things"
> Research Technical Staff, MIT/W3C
> <connolly@w3.org> http://www.w3.org/hypertext/WWW/People/Connolly

-- 
Craig Hubley                Business that runs on knowledge
Craig Hubley & Associates   needs software that runs on the net
mailto:craig@hubley.com     416-778-6136    416-778-1965 FAX
Seventy Eaton Avenue, Toronto, Ontario, Canada M4J 2Z5