Re: Is this use of BASE kosher?

Owen Rees (rtor@ansa.co.uk)
Thu, 3 Aug 95 12:08:42 EDT

Larry Masinter <masinter@parc.xerox.com> writes:
> I don't think we can treat HREF="#foo" as an optional optimization.
>
> Consider a URL whose content updates continuously, returning
> completely different HTML text each time you retrieve it. Within a
> document with such a URL as base, HREF="#place" should still refer to
> the _current_ instance, even if following a link to HREF=".#place"
> might retrieve a new instance.

If the BASE is a URL that might have been used to retrieve the resource then
there is no problem here, you go to where you would have gone if the merged
URL had been used in the first place.

Also, given that BASE is a URL that might have been used to retrieve this
resource, reloading the resource (proxy/cache willing) will retrieve the
updated version whichever of the BASE URL or previously used URL is used in
the new request.

Problems and confusion arise when the BASE is a URL that is not, has never
been and will never be a URL for any instance of any version of the resource
containing it. This thread started with an example of such a resource.

The wording in draft-ietf-html-spec-04.txt says that BASE is for resolving relative links:

5.2.2. Base Address: BASE

The optional <BASE> element specifies the base address for
resolving relative links from the document, overriding any
context otherwise known to the user agent. The required HREF
attribute specifies the URI for navigating the document (see 7,
"Hyperlinks"). The value of the HREF attribute must be an
absolute URI.

This does not in any way even hint at the possibility that BASE may in some way be related to the document containing it. That the BASE URI is to be considered a URI for the resource can be deduced from sections 7.1 and 7.4. In order for "#fragment" to both refer to an anchor in the same document and to be a degenerate case of the accessing described in 7.1, the BASE URI must refer to the document that contains it. This is not stated explicitly until you get to the DTD which contains this comment:

<!-- <BASE HREF="..."> Address for this document -->

Compare this with draft-ietf-html-specv3-00.txt:

BASE

The BASE element allows the URL of the document itself to be
recorded in situations in which the document may be read out of
context. URLs within the document may be in a "partial" form
relative to this base address. The default base address is the URL
used to retrieve the document.

Saying that the URI in BASE is the (implied canonical or best) URL for the document or that it must be a working URI for the resource is too strong, but I think that much of the confusion would go away if it is made clear that the base URI is to be considered a name for the resource that contains it.

Authors may choose to make their URIs ambiguous by putting the URI for one resource as the BASE in another. I think that this practice should be strongly discouraged - e.g:

The optional <BASE> element specifies
a URI for the document,
and this URI must be used as
the base address for
resolving relative links from the document, overriding any
context otherwise known to the user agent. The required HREF
attribute specifies the URI for navigating the document (see 7,
"Hyperlinks"). The value of the HREF attribute must be an
absolute URI.

The URI in the BASE element should not be a URI for any other
resource, as this would make the URI ambiguous.

An alternative would be to abolish BASE altogether! Any document containing BASE can be transformed to one without BASE by applying the relative URL rules without the need for any other context. It adds no power and seems to have created much confusion.

Owen Rees
<rtor@ansa.co.uk>, <URL:http://www.ansa.co.uk/Staff/rtor.html>
Information about ANSA is at <URL:http://www.ansa.co.uk/>.