Re: Is this use of BASE kosher?

Daniel W. Connolly (connolly@beach.w3.org)
Sun, 6 Aug 95 16:24:07 EDT

In message <199508060127.TAA06442@macondo.math.utah.edu>, Paul Burchard writes:
>"Daniel W. Connolly" <connolly@beach.w3.org> writes:
>> [This is getting tiring!]
>
>Quite...but so far I have only seen your *assertion* that the base
>URI must identify a resource of which the document in question is a
>representation. On what basis do you make this claim?

This claim is a premise of my arguments, not a conclusion. Hence
I can't defend it. I can only motivate it.

I'll try to spell out the model I used to write the current draft
in detail, and then we can see (1) if this is a model we're happy
with, and (2) if it's sufficiently well explained in the draft.

OK. URIs identify resources. So there's a relation:

identify < URI X Resource

In fact, each URI identifies exactly one resource, so it's functional:

identify : URI -> Resource

But this relation isn't directly observable: you can't necessarily
put a resource on the wire, or touch it or observe it directly in
any way. But you might be able to get representations of a resource
at various times. So there's another relation:

represent < Entity X Resource X Time

For represent(e,r,t), read "e is a representation of r at t."

Since resources aren't directly observable, it's more useful to
use the composition of identify and represent than either of the
two alone. And it's more convenient to talk about intervals of
time than instants:

repr < Entity X URI X Time X Time
repr(e, u, t1, t2) <-> Exist r = identify(u) /\
Forall t1<t<t2, represent(e,r,t)
i.e. "e represents the resource identified by u
during the interval from t1 to t2"

OK. That's enough definitions to start. So, as the Aug 4 draft
says, navigation begins with a base URI u* and a base document d*
such that:

repr(d*, u*, t1, t2)

The spec doesn't mention it, but to be precise, the interval [t1,t2]
must include the time the document was fetched.

The spec also says that a link is a relationship between two anchors:

Link < Anchor X Anchor

and that each anchor has an address, which is a URI and an optional
fragment identifier:

AnchAddr = URI X FragId
addr : AnchAddr -> Anchor

RFC1808 defines an algorithm for combining a relative URI with
an absolute URI to form another absolute URI:

combine: relURI X URI -> URI

(as in RFC1808, relURI includes URI, by the way)

The HTML spec enumerates the markup constructs wich identify links.
Each of those markup constructs identifies the head anchor with
some attribute. The value of that attribute is a relative URI,
a fragment identifier, or both.

linkmarkup : Entity X Element X relURI X FragId
for (d, el, u, f), read "document d contains an element el
that specifies relURI u and FragId f"

So suppose the document d* has an A element eA whose href attribute
specifies relURI r* and FragId f*:

linkmarkup(d*, eA, r*, f*)

This identifies an anchor, given by:

addr(combine(r*, u*), f*)

To access this anchor, the HTML user agent "resolves" the
head URI uh = combine(r*, u*). That is, it finds some document dh
such that

repr(dh, uh, t3, t4)

where [t3,t4] includes the time of access. Now the user agent has a
new base document dh and a new base URI. The URI is dh by default, but
an HTTP redirection, a URI: header or a <BASE> tag in dh may indicate
that the new base URI is something else, dh'.

If f* is not the null FragId, the HTML user agent finds the A element
whose NAME is f*, and begins navigation there. And we start the whole
thing over again.

Whew!

Now: this thread started out asking about the case where r* = "",
asking whether the spec says that the user agent must use dh=d*, or
if it's OK to go ahead and dereference uh to get a new dh.

I initially answereed that either behaviour is conforming, but
folks weren't happy with that. So rather than making a special
case:
r* = "" => dh = d* (1)

I've made a more general requirement:

uh = u* => dh = d* (2)

We see that (1) is a particular case of (2), since

r* = "" => combine(u*, r*) = u* => dh = u*

So the case of href="#name" is not an exception, but a particular
case of the general rule. Since "degenerate case" causes such
consternation, I'm happy to change it to "In particular, ...".

> So far, I've seen only one well-defined
>procedure for handling relative URLs: RFC 1808. In it, Roy Fielding
>has presented a well-thought-out model in which:
>
> * fragments are handled uniformly with other URL postfixes;
> * resolution of relative URLs is orthogonal to determination of base URL;
> * resolution of relative URLs is orthogonal to resource retrieval.
>
>Within that model, the proposed rule for pure fragments (while desirable)
>would indeed be a exceptional case, breaking all of the above properties.

I'm not sure how to answer this. You use the phrase "resolution
of relative URLs". By that, do you mean fetching stuff via HTTP,
or just combining a relURL with an absolute URL to get another
absolute URL? In my mind, you can only "fetch" or "access" or
"resolve" and absolute URL. The only thing you can do with
a relative URL is combine it with an absolute URL to make it
into an absolute URL.

>"Daniel W. Connolly" <connolly@beach.w3.org> writes:
>> In other words, '#frag' is not the special case: the special
>> case is _any_ link to the base URI; for example, if the
>> base uri is http://www.w3.org/hypertext/WWW/TheProject.html,
>> then the browser is required to use the already-fetched copy
>> of the document for all of these:
>> [examples deleted]
>
>No, now you are just proposing a more general exception to the
>principle that the resolution of relative URIs be orthogonal to
>retrieval issues.

Nope. The exception has *nothing to do* with relative URIs. It
has to do with the head and the tail of a link having the same
absolute URI in their address.

Another question is whether the <base> tag is used just to
combine with relative URLs, or whether it should be used
as the displayed address, and in hotlists, etc. I'll
answer that in another message.

Dan