Re: converting URLs in .html files (Curt Tilmes)
Message-id: <>
From: (Curt Tilmes)
Subject: Re: converting URLs in .html files
Date: Tue, 31 Aug 1993 12:53:25 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL22]
Content-Type: text
Content-Length: 1575      
Status: RO
>Has anyone dealt with automatically converting the URLs within HTML files
>so that you could take a set of files like the Library of Congress Vatican
>Exhibit and use them off a local HTTP server rather than across the

I've been thinking about what it would take to create a 'mirror' program
for HTTP, similar to the program of the same name for anonymous ftp sites.

I want to give the program a single URL, and a local directory and have it
retrieve the document for that URL, then extract a list of other URLs, and
determine which are "internal" links to more pieces of the same 'exhibit',
and which are "external" links.  It would keep all the "external" links,
and change all "internal" links to refer to the local http server, and 
retrieve their documents.

The only thing I can think of that would work well for determining whether
or not a link is "internal" or "external" would be to violate the URL,
looking inside the opaque descriptor and determine if the two documents
were in the same directory.  But then, we see many closely related 
'exhibits' that are spread among several directories.  If those directories
are all subdirectories of a single directory, though, the parent directory
could be used in a comparison to determine whether the URL was "internal"
or "external".

With such a program (and it doesn't really look too difficult for a perl
script), we could have major 'cache' sites in different areas of the world
that maintain large 'archives' of great HTML material.

Of course, the URN concept makes this all work even better.