Re: Is this use of BASE kosher?

Daniel W. Connolly (connolly@beach.w3.org)
Wed, 2 Aug 95 18:16:02 EDT

In message <199508022100.OAA25434@shell1.best.com>, "Peter K. Sheerin" writes:
>> > But my question is a little more specific. Should #fragments be counted
>> > as part of the URL for the specific purpose of determining whether they
>> > always refer to the current document, or are appended to whatever the
>> > value of the BASE URL is?
>>
>> The only thing that makes sense to me is that HREF="#fragment"
>> references should refer to the current document, even though any other
>> references HREF="../c#fragment" are relative to the base.
>
>That's the only way it makes sense to me

In a way... but how is treating "#fragment" as a reference
to the current document _different_ from treating it as
"relative to the base"?

On seeing a <base href="http://here/"> element, the user agent treats
http://here/ as the address of that document; hence references to
http://here/#fragment _are_ references to the current document.

> but I don't see that spelled
>out in the spec,

You don't score very high for resourcefulness :-)

Hypertext Markup Language - 2.0 - Hyperlinks
Fri Jun 16 19:56:22 1995
http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_7.html#SEC69

|Fragment Identifiers
|
|Any characters following a `#' character in a URI constitute a
|fragment identifier. As a degenerate case, a URI of the form
|`#fragment' refers to an anchor in the same document.

Note that it's "as a degenerate case" and not "an exception".

> and even RFC 1808 appears to conflict itself. It says
>that fragment identifiers are not part of the URL,

There's a lot of political history behind that. Some folks
argued that since the fragment identifier isn't used to resolve
a resource, it's not part of the URL. They got their way in
the wording of the specs, but the world continues to conceive
of URLs as originally implemented in WWW; that is, the fragment
identifier is part of the URL syntx -- look at any primer
on URLs [hmmm... the NCSA URL primer doesn't talk about fragment
identifiers -- they cover it in the HTML primer].

> but then includes an
>example in section 5.1 where a stand-along fragment identifier is merged
>with the BASE URL (although it's not clear if this is the current
>document's default URL

Can you clarify what you mean by "the document's default URL"? The
specs speak only of "the base URL."

>Since the two biggest browsers handle this situation differently right
>now, I submit that we should produce language defining a sensible
>behaviour, since there seems to be no common behaviour to document.

I think RFC1808 is sufficiently clear on this. The prose might could
use a little tweaking, but the example test cases make it quite
clear, I believe.

> But
>does this wait for 3.0, or could it go into 2.1,

I don't believe this is an HTML issue. The HTML spec covers this
issue only to the extent that it explains which parts of HTML markup
are the "arguments" to the parsing and combining algorithms in
RFC1808.

> or am I missing wording
>somewhere in this morass of RFCs, and SGML and HTML specs that defines
>behaviour here?

I believe the current specs (RFC1808 and HTML2.0 draft-04) are
consistent and complete on this issue, if not completely clear.

I've got a student working on an extensive URL parsing test suite,
and I'm reviving my HTML test suite.

I've added a couple items to my HTML test suite on this issue:

http://www.w3.org/hypertext/WWW/MarkUp/html-test/hyperlinks/base-frag.html
and
http://www.w3.org/hypertext/WWW/MarkUp/html-test/hyperlinks/frag-encode.html

the test suite as a whole needs quite a bit of release engineering
to make it usable again.

But have a look at those two test cases and let me know if they
clarify things for you, ok?

Dan