Re: Is this use of BASE kosher?

Owen Rees (rtor@ansa.co.uk)
Tue, 1 Aug 95 07:27:00 EDT

"Daniel W. Connolly" <connolly@beach.w3.org> writes:
> Q2. What is the address of the head of the link whose tail is "Support
> and ..."
>
> A2. http://www.hp.com/go/ftp-sites#Miscellaneous Support
> aka
> http://www.hp.com/go/ftp-sites#Miscellaneous%20Support
>
> (space and %20 have identical semantics. Both are correct.)

I disagree - I think that space must be written %20 in this context, my reasoning follows from Larry Masinter's message. A browser may be lenient here, but that is a separate matter.

In another message (in response to one of mine):
Larry Masinter <masinter@parc.xerox.com> writes:
> > The key question is whether or not URIs that appear in HTML are in their
> > encoded or unencoded form.
>
> There is no 'unencoded form' for URLs. That is, URLs have a common
> method for encoding octets within certain components, but removing
> the encoding doesn't leave you with a URL. (Consider those URLs that
> encode "/" as "%2F" to avoid having the / treated as a hierarchical
> demarkation, for example.)

OK. RFC1738 says unsafe characters must be encoded, RFC1808 extends this to cover the 'fragment' following '#' so in 'href="#Miscellaneous Support"' the quoted text is not a URL within the meaning of RFC1808.

draft-ietf-html-spec-04.txt appears to allow for URIs that are not URLs withing the meaning of RFC1738 and RFC1808, but this does not look like a sensible interpretation here.

Oh dear! '#' and fragment identifiers might not be valid in anchors according to the HTML draft. In the comment in the DTD it says 'The term URI means a CDATA attribute whose value is a Uniform Resource Identifier, as defined by "Universal Resource Identifiers" by Tim Berners-Lee aka RFC 1630' and RFC1630 page 22 makes it clear that the fragmentid is not part of the URI.

A possible solution is to replace this with a reference to 'URL as defined in RFC1808'. This would mean that fragments are permitted, and, being a URL, encoding has been applied (i.e. require %20 for space, and %25 is unambiguously an encoded percent character, not to be encoded again.) This would exclude URIs that are not syntactically URLs according to RFC1808, but it is doubtful if the current reference to RFC1630 permits any such URI. In this context "URL" should be considered merely a label for the syntax so as to avoid the debate about URLs and URNs.

Since anchor names are not URIs, presumably the encoding rules do not apply to them. Therefore <A HREF="#a%20name"> introduces a tail for <A NAME="a name">, and there is no option of not encoding the space in the tail, or encoding it in the head. I think it needs to be made explicit whether or not names are encoded since there is potential for confusion here. (I am happy with either encoded or not encoded, but absolutely opposed to encoding being optional - that old ambiguity argument again.)

Owen Rees
<rtor@ansa.co.uk>, <URL:http://www.ansa.co.uk/Staff/rtor.html>
Information about ANSA is at <URL:http://www.ansa.co.uk/>.