CPSC 547 - Project I

Internet Research Tools: The World Wide Web and Archie



World Wide Web

The WWW is the fastest growing area of the internet. By using new GUI based browsers such as Netscape, hypermedia has become widely available and very popular. Seamless connections between HTML documents (which have made gopher documents obsolete), Archie, FTP sites, telnet, and many other internet resources have made the Internet much easier to use.

Although text-based browsers such as Lynx are still in use, they are rapidly becoming obsolete as more and more graphics are being incorporated into documents, and as "Netscape enhanced" web pages are refusing to be displayed through the text-only interface.

Powerful search engines such as Digital's Alta-Vista make finding useful material much easier. Alta-Vista boasts the largest index of any search engine, with over 16 million pages. It allows complex keyword searches which include a NEAR clause (must be within 10 words of each other), and date exclusions (allowing you to restrict the search to pages, for example, which are dated between January 1, 1995 and February 1, 1996). Another promising aspect of Alta-Vista is that it searches the Usenet (of which its index is updated in real time). Despite its capabilities, Alta- Vista boasts less than a 1 second search time for most of the 2 million queries it processes daily. It truly is the mother of all search engines.

In a sample search for material on intelligent agents, a search for the two keywords intelligent agents returned about 40000 hits. Of the first 10 that were displayed, not many looked promising. A new search for the clause "intelligent agents" (compounded) reduced the hits to 4000, and a final search for title:"intelligent agents" (the clause occurring in document titles) reduced the number to a very manageable 61 hits. A glance at the first 10 in the list verifies that they almost all appear to contain valuable research data.

Many other valuable search engines exist such as MetaCrawler, Lycos, Yahoo, Webcrawler, and several others. For extensive searches, they can all be used with differing levels of speed, usability, and success.

Despite the obvious strengths of the WWW in its widespread usage and powerful search engines, it has weaknesses. Frequent connections to remote sites and the transfer of large numbers of graphics both contribute to making the system slow to navigate.

Also, filtering the "garbage" documents from the "gold" is always very time consuming and often frustrating, even with the best of search engines.

Archie

Archie is a keyword search tool for files on anonymous FTP sites. It is most useful when an exact filename is known, but its location is not. Traditionally, Archie has been used via a Unix command line. Searches could also be submitted via gopher or Email. Recently, however, with the explosion in WWW usage, more user-friendly form based queries have become available.

The web explosion seems to have had another side effect on Archie. Archie servers had previously been getting heavily loaded and difficult to access as FTP usage had been increasing. Direct access to common files via web pages have presumably been the reason that Archie servers have become quite easy to access again.

Access to files via web pages, together with the faster, more powerful search engines on the web are making Archie something of a dinosaur. While a search for the keyword "InterAp" (the name of an intelligent agent), submitted via an HTML form, took about 20 seconds and responded with several FTP mirrors, a search for "InterAp" via digital's Alta-Vista search engine on the web found, in about 3 seconds, the developer's own site for the InterAp archives at the top of the search results. The site had installation instructions, uncompressed file sizes, licensing overview, and direct ordering instructions. This web search had an obvious speed improvement with a vastly superior file description.

Other Topics


Email Addresses

You can mail any comments or suggestions to
marta@cpsc.ucalgary.ca