Accessing Internet Information

An Overview of the Task
Newsgroups
Electronic Mail
FTP, Archie and Gopher
World Wide Web


An Overview of the Task

The Internet is a vast archive of human knowledge. This archive is housed on a multitude of computers of various types which are interconnected by digital networks. This interconnection allows people that have access to a computer on one of these digital networks to be able to access virtually any of the information that exists on the Internet. Or at least that is how the Internet works in theory.

In fact, the vast amounts of information that exist on the Internet are both its greatest advantage as well as its greatest liability for any individual hoping to research a particular topic on the Internet. The sheer volume of information that exists on the Internet, coupled with the fact that any piece of information can be located virtually anywhere on the Internet can make the task of finding information on a specific subject quite daunting. When one initially goes to look for some information on the Internet, one often has no idea if anyone has even put any information about the topic on the Internet. Or, if they are quite certain that information will exist, they will almost definitely have no idea whatsoever of where it might be found among the multiple millions of computers that the Internet is comprised of.

However, if that was the entire picture, then the Internet would not have become as popular as it has. In actuality, there are a number of tools that have been developed to help locate existing information on the Internet. As well, other tools also exist that allow an individual to interact with other individuals connected to the Internet and actively exchange information with each other that may not have ever been on the Internet previous to their discussion. By using these tools, information on a specific topic can often be found quite readily, although it is often difficult even then to determine if the information located is exhaustive, or if it at least contains a sampling of the best data available on the subject at hand.

For example, our group, having been assigned to present an in-class discussion of the Electronic Publishing and Digital Libraries, set about to use the Internet to locate information on that topic. In looking for this information, we used various tools both to search for the information and then to either read it or download it to the computers we were working on. The following is an outline of the tools that we used, the manner in which we used them, the strengths and weaknesses that we perceived in each tool, and an overview of what information we were able to either locate or download by using these tools.


Newsgroups

One tool on the Internet that we used in our attempt to find information was the News. This is a tool that allows one to post questions and information into a public forum without knowing who the recipients may be. Furthermore, this public forum is broken down into various special interest group areas, which are given names intended to convey that fact. Thus, by posting questions or information into one of these special interest group areas, one can attempt to get information from, or disseminate information to others within that special interest group without having to know even one of them personally. These newsgroups are generally rather active and up to date. One will often be able to retrieve information from them on the latest happenings within that special interest group, URLs of WWW sites that contain further information on the special interest area that the group represents, or the text of FAQs (frequently asked questions along with their answers) that deal with the interest area at hand.

This fact make newsgroups a powerful tool for explicitly requesting, or else browsing through to find information that one might require about the special interest areas covered in the newsgroups. This is especially true when the information may be too new to be in print, may require special hands-on experience that only a few people might have but which wouldn't normally be formally documented, or when one just wants to get a broad cross-section of opinion on a particular subject. However, newsgroups also have their drawbacks.

First of all, although newsgroups are broken down into special interest areas, there is no mechanism in place, other than the outrage of those who subscribe to it, to ensure that all articles that are posted to a group actually have anything to do with that group. In our use of newsgroups to find information on our presentation topic, it did not take long for us to see that unrelated articles are indeed sometimes posted to groups, often by inexperienced users who made an honest mistake, but just as often by individuals who have made the postings purposely. This problem seems to be less related to groups that are dedicated to professional or academic pursuits, but it is still there to some extent.

Secondly, the more active the newsgroup is, the more information there will be for the individual to filter through in finding what they are looking for. The fact that discussions are often broken down into threads of related articles, usually consisting of the initial posting of information or the request for information and then any responses to that posting, does help in the filtering process, but not greatly. One often has to interpret the meanings of obscure thread titles, or else browse through nearly all of the articles in order to get an idea of what information might be available within a given group.

Thirdly, newsgroups are far from exhaustive in the interest areas that they cover. As well, depending on who maintains the list of groups available to the news server that one has access to, existing news groups may even be unavailable to an individual and they may have to go through the further step of requesting that it be carried locally, or else attempt to find some alternate method by which they can obtain the information. For example, in our attempts to find information on Digital Libraries, only one group was found that appeared to have any relevance at all to the topic. However, on the topic of Electronic Publishing, especially in the area of HTML, there were a number of relevant groups.

Lastly, if one is requesting information, then one has to recognize the fact that there is often a time lag between when one posts the request and when a response is issued, depending on the difficulty of the question and the amount of activity within the group. Furthermore, information given in response to a posting that may be of interest to other individuals is often not seen in the newsgroup itself, as posters often give their e-mail address as the place to respond to, rather than indicating that they will look for the answer within the newsgroup itself.

In summary then, newgroups are generally a good starting point for research into a topic. They give the reader an idea of the current issues at hand in that special interest community and also expose them to many of the catch-phrases and jargon that exist within that community. This exposure can often be of great value when the reader moves on to use other methods of obtaining information on the Internet like search engines, because they would have often been exposed to those phrases that are unique to, or at least appropriate for searching for information on their topic of choice.

Newsgroup Log


Electronic Mail


Electronic mail, or e-mail as it is more commonly called, was another tool that we used as we researched our topic. It is a good, but sometimes forgotten form of requesting information over the Internet. Like newgroups, using e-mail allows one to request that others send you information about specific topics rather than having to search all over the Internet in the hope that the information you seek will be contained on a Web site. However, unlike posting to newsgroups, an individual sending an e-mail message knows who the intended recipient will be, and also that there will likely be only one, or at best, a small number of recipients. This means that one has to be selective in whom they send messages to, and they need to know something about the intended recipients before they send the e-mail. For example, one would want to have some confidence that an intended recipient has, or at least purports to have the required knowledge that would allow them to answer an e-mailed request for information.

One way to determine if an individual may have the knowledge within an interest area to answer a question e-mailed to them that we found was to find e-mail addresses from people who had posted related information to newsgroups related to aspects of the research topic. We reasoned that, if they demonstrated knowledge on a subject closely related to the subject that we were looking for information on, then there was a high probability that they would have knowledge on our area of research as well.

A second, but just as effective way to find e-mail addresses that we discovered was to retrieve them from WWW pages. Often, people discuss areas of interest on their Web pages, and they also often either embed Hyperlinks to their Internet e-mail address, or at least make it part of the content of their pages in the hope that they will receive comments or suggestions from the general public on the things that are on their Web pages. Again, as in the case of newsgroups, if the information discussed was similar to or was on the same topic as the information that we were looking for, we felt that they would potentially be a good source of information for the area we were researching.

These are not, however, guaranteed methods of finding individuals who have the information that one is seeking. Sometimes the individual that one is e-mailing does indeed have related knowledge, but doesn't have the information that is being sought. Sometimes the e-mail address on the Web page is for an automatic responder that sends back a polite message of thanks for the interest shown in their site, along with a generic overview of what they think most people would be asking about, but no information on the topic that information was sought on. Sometimes the individual whose e-mail address is given on the Web page is only a maintainer who provides the service of designing and formatting the Web page for information given to them by others, but they themselves have no indepth knowledge of the subject that information is sought on.

Another drawback in the use of e-mail is the fact that there can often be a long time lag between when the message is sent and when the individual that received it responds. In most cases, if the e-mail address that the message is sent to is invalid, the sender will receive back an error message telling them that the mail was not delivered. In this case, the sender will obviously not expect a response. However, if no such error message is received back, then the sender will likely assume that the message got through. From that point, however, it is a waiting game. In one case of e-mail that we sent to Time magazine in to get more information about Electronic Publishing, it took about three days for us to receive a response. Other e-mail that we have sent have been gone for over three days and have, as of yet, still not elicited any reply from the recipient.

In summary, e-mail can be an effective tool for getting very specific information from specific individuals. When the correct individual is found to request information from, and that person replies, the information that is received is second to none, as it is tailor-made to the request initially sent. However, finding the correct individual can be a difficult process, and, as we found, even once a potentially good resource individual is sent e-mail, there is no guarantee that they will respond quickly to a request, if they respond at all.

Electronic Mail Log


FTP, Archie and Gopher


FTP, Archie and Gopher are some of the original tools that were developed to transfer files and search through the archives that make up the Internet. Each of them can be and have been used as standalone products. However, with the advent of the WWW, and in particular the use of HTML, Multi-Purpose Internet Mail Extensions (MIME) and the HTTP protocol, these tools are quickly becoming outmoded, and their protocols are more likely to be referenced from within a WWW browser application. In fact, in our research, we found no reason to access any of these tools directly. Rather, it seemed to be the case that these tools would be referenced within a Web page, and that the Web browsers that we were using would then handle the FTP, Archie, or Gopher commands that would be required to access the information pointed to.

That is not to say that these tools are not useful, or that they are no longer used. We discovered that Microsoft, for example, uses an FTP site as a repository for software like their Internet Assistant for Word, which is an HTML authoring tool. Maclean's Magazine uses a gopher site to archive its back issues, and large numbers of Digital Libraries on the Internet are also stored on gopher sites. The functions that these tools formerly performed on their own have now been encapsulated into other, higher level protocols that are now in use on the WWW. One reason for this, we suspect, is because of the relative difficulty that using these tools presented when compared to using the HTTP protocol on the WWW. Each of these tools has their own interface and commands, and these are not in any way seamless. In contrast, using HTML tags within Web pages to access these same tools, requires the user to know only how to click on a Hyperlink in order to invoke all of the required commands. The rest of the process is left up to the HTML programmer and the HTTP protocol. In fact, in the above examples of FTP and Gopher use, files are accessed via Hyperlinks on corporate Web pages.

It is perhaps best that this is the manner in which these sites are now being accessed, as they are not easy to locate on their own. Unless one knows the exact Internet address where the site can be located, finding one can be nearly impossible unless one stumbles upon pre-existing links between these various sites, or textual information within an existing site that directs one to other similar sites. Once one has located one of these sites and is viewing the listings of files available to view or download, it is quickly discovered that the files often come with little or no descriptors to describe what they are or what they might be used for. Thus, one often has to download or view many of the files on one of these sites in order to find something that is useful. This often translates into long periods of time expended for no measurable gains.

In summary, tools like FTP, Archie and Gopher are no longer the most commonly-used tools on the Internet. However, they continue to survive on the Internet, but accessing them is now often done through Hyperlinks on Web pages which allow for the inclusion of more descriptor data for each file that can be accessed. This approach seems to work well for all involved, as it allows existing sites to be able to continue to be accessed without forcing their maintainers to migrate them to new formats. As well, the increasing numbers of non-technical and casual users on the Internet are able to seamlessly use Web browsers to find information in all of its forms on the Internet without having to be aware of the fact that they are utilizing these tools.

FTP, Archie, and Gopher Log


World Wide Web


The World Wide Web (WWW) primarily consists of the idea of using the HyperText Markup Language (HTML) to prepare Web pages that can be stored on the Internet and that can be viewed by Web browsers. These browsers find these files by means of the HTTP protocol, read them in, interpret the embedded commands within them, and then display those Web pages to the end user. Along with these features of the WWW came the advent of WWW search engines which maintain large databases of Web pages and their contents. These engines allow an individual to submit a search based on keywords and return Universal Resource Locators (URLs) to those pages whose contents or titles satisfy the search criteria. In our research, we found that the WWW, and in particular, the Yahoo, Alta Vista, and Lycos search engines were especially useful in helping us locate information.

Each of the above search engines has a different method of searching the WWW to register the URLs of existing pages, each of them access different databases, each of them use different search methods to find those items matching the criteria submitted, and each of them present the results of a search to the individual that submitted the search in different ways. These facts are both a drawback and a strength of the WWW search engines.

One drawback is the fact that one cannot be certain that they will receive authoritative results regarding the pages that exist on the subject the search covered. This is because the search engine chosen may not have catalogued all of the pages on the subject that exist. As well, if one does not submit a search that uses keywords that are jargon almost exclusively unique to the subject area information is sought about, the number of returned Web pages that match the criteria may be physically impossible to search through in a reasonable amount of time. If too specialized of a term is used, or the combination or order of keywords is not conducive to a good search in relation to the topic, anywhere from no results through to a such a large number of results from many more topics than the one sought may be returned. In both of these extremes however, the final result is that the individual searching is left with the impression that there is no information available on the area that they are searching for.

One method that we found for circumventing some of these problems was to use two or three of the largest search engines simultaneously. By entering similar, or even identical keyword searches and then comparing the results, one can often find the best sites for a given topic, or else one can stumble upon words or phrases to use in subsequent searches that will narrow down the results to make them more useful. And, while using WWW search engines to locate sites of interest is a good starting point in using the WWW as a research tool, it is far from the end of the process.

Once a good site is found, going to view it often yields links to other related sites with information on them that one either would never have thought of looking for, or that one did not even realize existed. For example, in looking for information about HTML tags, the HTML Writer's Guild page was found. From this page, links to style sheets and even pages giving advice on rates to charge for creating HTML pages commercially were found. In fact, it was discovered that quite often, one would begin their search for data on the WWW by going to a page that was returned as a result of a search on a WWW search engine, and then would be able to follow links from page to page locating information related to the topic and, when either the thread of development along the topic line was exhausted or, more likely, the time available to research the topic had expired, one found themselves twenty or thirty links away from the initial site.

The fact that the WWW uses the HTML language as a standard for its interface also makes the WWW very user-friendly and almost transparent to the user because of the ease with which one can traverse it. This fact made it very easy for us to concentrate on finding the information that we were looking for, and allowed us to virtually forget about the tools that we were using to get the information. As well, as was mentioned earlier, the fact that the HTTP protocol allows one to access sites that store information in FTP, Gopher, and Archie formats without leaving this interface means that all areas of the Internet are accessible to users of the WWW, a fact that only increases its powerfulness.

In summary, the WWW is by far the most powerful tool that we found on the Internet for accessing information pertinent to our research. The vastness of its resources, the ease of searching through those resources, and the ubiquitous nature of the WWW on the Internet all contribute to its power and usefulness. The wide availability of tools for navigating through it also makes it a desirable tool for research. It is no wonder that is has replaced the earlier tools as the primary means by which information and files are presented and moved around on the Internet.

World Wide Web Log

Book Image
Chris Kliewer - Kliewerc@cpsc.ucalgary.ca
Merlin Griscowsky - Merlin@cpsc.ucalgary.ca
Simon Yung - Yung@cpsc.ucalgary.ca