Re: converting URLs in .html files

eostrom@mcs-server.gac.edu (Erik Ostrom)
Date: Tue, 31 Aug 93 12:37:40 CDT
From: eostrom@mcs-server.gac.edu (Erik Ostrom)
Message-id: <9308311737.AA01193@mcs-server.gac.edu>
To: www-talk@nxoc01.cern.ch
In-reply-to: <199308311631.AA20114@library.ucsf.edu> (dcmartin@library.ucsf.edu)
Subject: Re: converting URLs in .html files 
Status: RO
   Has anyone dealt with automatically converting the URLs within HTML files
   so that you could take a set of files like the Library of Congress Vatican
   Exhibit and use them off a local HTTP server rather than across the
   Internet?

This won't help with the Vatican exhibit, but: If a cluster of related
files is written using relative URLs, then the only `conversion' you
need to do is to change the entry point.

That is, if http://sunsite.unc.edu/expo/vatican.exhibit/vatican.exhibit.html
contained a link to HREF="exhibit/Main_Hall.html", then you could just
copy all the files over to your local net, and jump to (say)
file:///my/html/files/vatican.exhibit/vatican.exhibit.html,
and the reference would now point you to the Main Hall file on your
local filesystem.

The Vatican exhibit uses absolute URLs, which is a pain for moving or
copying files.  Oh well.  A cluster of related files using relative
URLs should be easily portable.  (That's the point of relative URLs,
as I understand it.)  Yes, you need

					 to recreate the folder hierarchy of
   the source server 

but this is something that tar and other archivers already do.

For links that _aren't_ relative, it's really questionable whether you
want to convert.  Usually (I hope) an URL inside a document is
absolute because it points to something unrelated, which wouldn't be
part of the package you brought to your local net anyway.  Of course,
this isn't the case with the Vatican exhibit, or, no doubt, with many
other data sets on the web now.  I can dream, though.