Re: Is this use of BASE kosher?

Dave Hollander (dmh@hpsgml.fc.hp.com)
Thu, 3 Aug 95 11:05:03 EDT

Because I am the one that Peter Sherrin describes in the base note [1]
and I have a vested interest in this subject and there were a lot of
messages, I feel compelled to follow up in detail.

This response is broken into sections (divided by ------ZZ)

Conclusion
Introduction
Running Commentary
Recommendations
Motivation/Purpose
References

------------------------------------------------------------------------ZZ

Conclusion

RFC 1808 [5] is a good draft. There are a few changes needed to clarify
details around fragments. (See recommendations below)

My document does have the correct fragment and relative URL
coding and the behavior is correct in many versions of many browsers.
The expected behavior, clearly documented in HTML 2.0 [4], is what I
expected.

We have work to do to clarify the expected behavior regarding the
"known URL" of documents. There should be a means for an author to
identify the URL that is prefered to be displayed and saved when
multiple URLs will resolve to the same document. Imagemaps and
cgi-bin scripts are two common examples of multiple URLs for the
same document. This has been suggested several times in discussions
regarding "bookmarks" [2][3] although there has been little discussion.

The current practice of many browsers is to display and hotlist the
URL used to access the document. This is clearly wrong and and
harmful to the quality of the web. This issue is in keeping with the goal
of RFC1808 "the long term usability of embedded URLs".

The proper treatment of spaces in a URL is unclear (to me at least)
and perhaps contradictory between applicable specs. 1808 is clear (and
I will assume applicable) therefore I will ask the author to stop using
space characters in URLs (just to be safe).

------------------------------------------------------------------------ZZ

Introduction

Peter starts with a discussion of one of my web pages,
http://www.hp.com/go/ftp-sites. Note that this page has changed, but
many of those on www.hp.com (ex. /go/computing) use the go path in
the URL. In this url the path element is not the actual file system path
name but rather the path element is that of a server script which
delivers the document using the HTTP location method. This is the
prefered access method because it is easier to maintain working URLs
using the script than the file system.

Errors with one browser with this page lead to the question:

"The URL used here, though, points to an entirely different page -- one
which doesn't contain the named sections specified in the relative
URLs. ...inclusion of an improper BASE URL reference. Which of us is
right?"

First, the # fragment in the sample refers to the same page. The base
tag in the document has /go/ftp-sites as the path element. This is an
alias for a cgi-bin script that used the HTTP location process to resolve
to /ftp-sites/Peripherals.html. This is the document included here; it
can be accessed using either URL. The # fragment is within this
document.

I have had no trouble traversing the link using either netscape, aol or
mosaic on either a Mac or HPUX system.

Is this a "improper" URL? No.

What is the proper behavior for the form: href=#abc? This is a
reference in the current document and should not be treated as an
URL [4]. This implies there is no reason for a browsing agent to
re-access the network resource (document); even if it did, it should
separate the URL from the fragment, access the URL [which would
result in the same page] then locate the element with the name
attribute equal to the fragment name. [4]

------------------------------------------------------------------------ZZ
Running Commentary
-----------------------------

Larry Masinter writes:

"I think the relationship of the destination URLs and the base is spelled
out in RFC 1808, and that Section 10, "Appendix - Embedding the Base
URL in HTML documents" is pretty explicit."

But, 1808 [5] does not address the behavior of fragments. It is still in
draft status (?). Besides, as it stands, the only passage that impacts the
fragment question are the examples in section 5. The HTML 2.0 spec is
quite specific about the proper procedures to be applied to fragments [4].

-----------------------------

Peter Flynn writes:

"It follows that a BASE url should terminate with a directory name, not
a filename, and that relative urls in the document should therefore
begin with the relevant filename."

and Peter Flynn writes again:

"My fault, then. I could have _sworn_ it
finished with a filename, because I was _specifically_ surprised to see it
there, which is why I wrote what I did. ... "

With the behavior of Mosaic (using the base in hot lists and url display
field), I see no sense to not use a filename (or alias). Only allowing
directories would violate current practice as guided by earlier versions
of the HTML spec and the current 3.0 spec. [6] It is also unnecessary
due the the thoroughness of RFC1808 section 4.

-----------------------------

Daniel W. Connolly writes:

"...now it is conforming..." Yep, some coding errors.

"... 1. Resolve this partial URI into a full URI using the BASE address,
and resolve the resulting absolute URI. (as per 7.2. "Activation of
Hyperlinks" and 7.1. "Accessing Resources") At this point, it should
realize that it's already resolved that URI, and it's got the document
on screen. It need only visit the anchor named "#Misc..." (as per 7.4.
"Fragment Identifiers").

2. It may go and fetch a new copy of the document. If the new copy
doesn't have the named anchor, then this is an error. The
implementation is still conforming -- the fault is the information
provider's for not making the information at the address the same as
the info in the given document.

[ the new copy will have the named anchor, so no error ]

3. Bypass all that and realize that "#xxx" is _always_ a reference to
the current document:

|7.4. Fragment Identifiers
|
| Any characters following a `#' character in a URI constitute a
| fragment identifier. As a degenerate case, a URI of the form
| `#fragment' refers to an anchor in the same document."

Any one of these behaviors would yield my desired results, a jump to a
location within the current document.

-----------------------------

Private mail with Sheerin, Peter:

"I still believe that your use of the BASE URL to intentionally point to a
different document is wrong, since the spec says it shall always refer to
the document's correct URL, and is intended for use when the
document is viewed out of context. But this is separate from the
handling of fragment identifiers. "

I am not using the BASE to point to a different document, just to
identified the prefered URL to address the document. I will admit it is a
small stretch, but one well within the standards, todays and yesterdays.

-----------------------------

Larry Masinter writes:

"Neither document actually defines what the fragment identifier
*means*: this is presumably left to the HTML spec.

RFC 1808 seems to disallow spaces in #fragments. I think the choices
are:

1) HTML disallows spaces in anchors
2) RFC 1808 is wrong, or doesn't apply to HTML. Spaces are
allowed in anchor references
3) Spaces aren't allowed in #fragment identifiers, are encoded,
but are allowed as name references. "

So this leads me back away from 1808 to the HTML Specs as to what is
the meaning and behavior associated with a fragment.

Larry Masinter writes again:

"The only thing that makes sense to me is
that HREF="#fragment" references should refer to the current
document, even though any other references HREF="../c#fragment"
are relative to the base. "

This is precisely what html 2.0 states.

-----------------------------

Daniel W. Connolly responds:

"...I believe the current specs (RFC1808 and HTML2.0 draft-04) are
consistent and complete on this issue, if not completely clear. "

HTML 2.0 is clear. RFC 1808 needs to rethink the treatment of
fragments. HTML 3.0 needs to catch up (surprise?).

------------------------------------------------------------------------ZZ
Recommendations
-----------------------------

1) RFC1808

Remove the examples regarding fragments. The procedures and
algorithms do not state that a #xxx = anything. The do state that the
fragment must be parsed, how it is parsed and I believe this is
sufficient.

Review (and fix?) the steps in section 4. I believe that the skip in step 4
should be to part of step 5 and step 5 to 6. I did not study well enough to
be sure, but it seems that a relative URL can not inherit the params,
query or fragment components.

2) HTML 2.x

Add a bookmark tag. As long as the base tag purpose is overloaded with
partial url expansion and known URL meaning, it will cause problems.

3) Applications

Communicate to client application developers a coherent position on
the issues of fragment handling, relative URLs and URL displays
(hotlist and GUI). Cite references. This can not be authoritative, but
should help. I will volunteer if desired.

------------------------------------------------------------------------ZZ
Motivation/Purpose
-----------------------------

I ran out steam. Please look at [2] for an explanation of what was done,
why and what the needs are around known URLs.

------------------------------------------------------------------------ZZ
References:
-----------------------------

[1] Is this use of BASE kosher? Peter Sheerin; html-wg;
Sat, 29 Jul 95 20:52:04 EDT
http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/0242.html

-----------------------------
[2] WWW Talk Apr 95-present: Browser Displayed URL;
http://gummo.stanford.edu/hypermail/www-talk-1995q2/0435.html

-----------------------------
[3] a BOOKMARK tag ??; Ben Adida (ben@hearstnewmedia.com);
Mon, 10 Jul 1995 14:48:15 -0400
http://gummo.stanford.edu/hypermail/www-talk-1995q3/0021.html

-----------------------------

[4] Hypertext Markup Language - 2.0 - Document Structure; June 16, 1995
http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_5.html#SEC27

"The optional BASE element specifies the base address for resolving relative
links from the document, overriding any context otherwise known to the
user agent. The required HREF attribute specifies the URI for navigating
the document (see section Hyperlinks). The value of the HREF attribute
must be an absolute URI. "

[http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_7.html#SEC65]

An anchor is a resource such as an HTML document, or some
fragment of, i.e. view on or portion of a resource.

Accessing Resources

To access the head anchor of a hyperlink, the user agent determines its
URI from the URI given in the tail anchor, using the base URI of the
document containing the tail anchor if necessary. Any fragment identifier
is discarded, and the result is used to access a resource, for example as
in [URL].

Fragment Identifiers

Any characters following a `#' character in a URI constitute a fragment
identifier. As a degenerate case, a URI of the form `#fragment' refers to
an anchor in the same document.

The meaning of fragment identifiers depends on the media type of the
resource containing the head anchor. For `text/html' resources, it refers
to the A element with a NAME attribute whose value is the same as the
fragment identifier. The matching is case sensitive. The document should
have exactly one such element. The user agent should indicate the anchor
element, for example by scrolling to and/or highlighting the phrase.

For example, if a user agent was processing a document identified as
`http://host/x/y.html' and the user indicated the following anchor:

See: appendix 1
for more detail on bananas.

then the user agent URI must access the resource `http://host/x/app1.html'.
Assuming the resource is represented using the `text/html' media type,
the user agent must locate the anchor named `bananas' and begin navigation
there.

-----------------------------

[5] R. T. Fielding. "Relative Uniform Resource Locators." Work in Progress,
UC Irvine, March 1995.
ftp://ds.internic.net/internet-drafts/draft-ietf-uri-relative-url-06.txt
ftp://ds.internic.net/rfc/rfc1808.txt;

Relative Uniform Resource Locators

"Note that the fragment identifier (and the "#" that precedes it) is
not considered part of the URL. However, since it is commonly used
within the same string context as a URL, a parser must be able to
recognize the fragment when it is present and set it aside as part of
the parsing process."

- and -

10. Appendix - Embedding the Base URL in HTML documents

It is useful to consider an example of how the base URL of a document
can be embedded within the document's content. In this appendix, we
describe how documents written in the Hypertext Markup Language
(HTML) [3] can include an embedded base URL. This appendix does not
form a part of the relative URL specification and should not be
considered as anything more than a descriptive example.

HTML defines a special element "BASE" which, when present in the
"HEAD" portion of a document, signals that the parser should use the
BASE element's "HREF" attribute as the base URL for resolving any
relative URLs. The "HREF" attribute must be an absolute URL. Note
that, in HTML, element and attribute names are case-insensitive. For
example:

... a hypertext anchor ...

A parser reading the example document should interpret the given
relative URL "../x" as representing the absolute URL
regardless of the context in which the example document was obtained.

-----------------------------

[6] HyperText Markup Language Specification Version 3.0; March 28 draft
http://www.w3.org/hypertext/WWW/MarkUp/html3/CoverPage.html

"BASE

The BASE element allows the URL of the document itself to be
recorded in situations in which the document may be read out of
context. URLs within the document may be in a "partial" form
relative to this base address. The default base address is the URL
used to retrieve the document.

For example:


...


which resolves to "http://acme.com/docs/images/me.gif". "