> > I did a search on "cgi" and got back a doc with a name I didn't
> > recognise. Now although I have several hundreds of HTML files,
> > like my children, I know most of them by name :*) I think you got
> > the href from a file that has a Base tag pointing to another server.
> Our spider doesn't follow links to servers other than the one where it
> starts (we trigger each index for each server individually). Documents
> from other servers would have come from distinct indexing sessions.
> Having said that, I'm not sure exactly what you're describing here. Can
> you describe it a bit more?
OK: as I'm not sure what you're not sure of, pls excuse if I
explain the obvious :*). Relative URLs are normally understood
to be relative to the directory the file is in. But the Base tag
can make the URL be relative to any other directory - and on any
other server. In the particular instance I had noticed, the file
was in fact adapted from the TOC of Ian Graham's HTML tutorial;
I didn't want to move all the sub files over so I just made the
Base tag point to the original TOC - not on my server. So if the
spider finds a reference in this file to "server-cgi-bin.html"
it should realise I don't actually *have* that file - it's where
Base says it is, i.e. some other server, in this case. If it doesn't
want to go on sidetrips to other sites I guess it's just going to
have to ignore relative URLs in files having Bases pointing to other
