Re: Is this use of BASE kosher?

Dave Hollander (dmh@hpsgml.fc.hp.com)
Thu, 3 Aug 95 14:37:32 EDT

> >
> > "Daniel W. Connolly" <connolly@beach.w3.org>
> > > The point is that you don't _need_ to retrieve a
> > > representation of the resource identified by the base
> > > URL: you've already got it!
> >
> > Not necessarily. Let's take the specific example that started this
> > whole discussion. I will phrase it as an HTTP response so that we
> > can assume we obtained the document as the result of retrieving the
> > URI shown:
> >
> > 200 OK
> > URI: <http://www.hp.com/Misc/Peripherals.html>
> > Content-type: text/html
> >
> > <head>
> > <base href="http://www.hp.com/go/ftp-sites">
> > </head>
> > <a href="#Printers">Printers</a>
> > ...
> > <h3><a name="Printers">Printers</a></h3>
> >
> > It's true that *without* the <base> tag, the relative URL
> > "#Printers" would resolve to
> > "http://www.hp.com/Misc/Peripherals.html#Printers" according to the
> > rules of RFC 1808. The entity corresponding to the URL
> > "http://www.hp.com/Misc/Peripherals.html" would then (luckily!)
> > already be in front of us, and the agent would merely need to scroll
> > down to get "http://www.hp.com/Misc/Peripherals.html#Printers".
> >
> > However, *with* the <base> tag, the relative URL "#Printers"
> > instead resolves to "http://www.hp.com/go/ftp-sites#Printers"
> > according to RFC 1808 (please read it). Thus, a correct user agent
> > must retrieve the entity representing the URL
> > "http://www.hp.com/go/ftp-sites" and find *its* "Printers" fragment
> > -- and we do not already have it in front of us.
>
> Right on.
>

The HTML 2.0 spec says:
For example, if a user agent was processing a document identified as
`http://host/x/y.html' and the user indicated the following anchor:

<p> See: <a href="app1.html#bananas">appendix 1</a>
for more detail on bananas.

then the user agent URI must access the resource `http://host/x/app1.html'.
Assuming the resource is represented using the `text/html' media type,
the user agent must locate the anchor named `bananas' and begin navigation
there.

The Hypertext Markup Language - 2.0 - Hyperlinks
http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_7.html#SEC65

RFC1808 does not specify that URLs that are only fragments must be fully
qualified before being processed, and it only suggests that they may be in
the examples.

Regardless of if you retrieve the document again or not (it IS the
same document) the browser must *then* locate the anchor named "Printers".
This is the desired result.

> Here's another example of the dillema we are faced with:
> Same document as above, but with a BASE URL in the header that points to
> the current document (http://www.hp.com/Misc/Peripherals.html).
>
> If I have saved this file to my local disk, severed my network
> connection, and then try one of those stand-alone fragment identifier
> links (which are meant by the author to point to the current document,
> not another one), these links don't work, because instead of the browser
> (Netscape in this case) jumping to that named section of the current
> document, it appends that fragment to the BASE URL, discovers that the
> implied BASE URL (the document's location on my disk) is different from
> the header-specified BASE URL, decides the latter is correct, and tries
> to go get *that* document. Since I no longer have access to the network,
> it fails, without ever displaying that part of the document that I
> actually do have access to. It's physically on my machine; why shouldn't
> the browser just display it?
>
> I think it boils down to if a stand-alone fragment identifier should ever
> be intended or interpreted as referring to anything other than the
> current document.
>
> I say, in both cases, no.

The html 2.0 spec is again complete (but pehaps could be clearer by
repeating the import of the statement in the example).

"As a degenerate case, a URI of the form `#fragment' refers to an anchor
in the same document. "

I see no other way of reading this but that #fragment is not a URL, does
not require mustering network resources and can only be treated as a
reference to the "current document" not a network resource.

>
> We need to address this because the two biggest browser vendors have
> interpreted this 180 degrees out of synch, and because it affects how all
> of us write HTML documents, period.

Agreed.

Regards,
Dave Hollander