An Overview of the Task
Newsgroups
Electronic Mail
FTP, Archie and Gopher
World Wide Web
The Internet is a vast archive of human knowledge. This archive
is housed on a multitude of computers of various types which are
interconnected by digital networks. This interconnection allows
people that have access to a computer on one of these digital
networks to be able to access virtually any of the information
that exists on the Internet. Or at least that is how the Internet
works in theory.
In fact, the vast amounts of information that exist on the Internet
are both its greatest advantage as well as its greatest liability
for any individual hoping to research a particular topic on the
Internet. The sheer volume of information that exists on the Internet,
coupled with the fact that any piece of information can be located
virtually anywhere on the Internet can make the task of finding
information on a specific subject quite daunting. When one initially
goes to look for some information on the Internet, one often has
no idea if anyone has even put any information about the topic
on the Internet. Or, if they are quite certain that information
will exist, they will almost definitely have no idea whatsoever
of where it might be found among the multiple millions of computers
that the Internet is comprised of.
However, if that was the entire picture, then the Internet would
not have become as popular as it has. In actuality, there are
a number of tools that have been developed to help locate existing
information on the Internet. As well, other tools also exist that
allow an individual to interact with other individuals connected
to the Internet and actively exchange information with each other
that may not have ever been on the Internet previous to their
discussion. By using these tools, information on a specific topic
can often be found quite readily, although it is often difficult
even then to determine if the information located is exhaustive,
or if it at least contains a sampling of the best data available
on the subject at hand.
For example, our group, having been assigned to present an in-class
discussion of the Electronic Publishing and Digital Libraries,
set about to use the Internet to locate information on that topic.
In looking for this information, we used various tools both to
search for the information and then to either read it or download
it to the computers we were working on. The following is an outline
of the tools that we used, the manner in which we used them, the
strengths and weaknesses that we perceived in each tool, and an
overview of what information we were able to either locate or
download by using these tools.
One tool on the Internet that we used in our attempt to find information
was the News. This is a tool that allows one to post questions
and information into a public forum without knowing who the recipients
may be. Furthermore, this public forum is broken down into various
special interest group areas, which are given names intended to
convey that fact. Thus, by posting questions or information into
one of these special interest group areas, one can attempt to
get information from, or disseminate information to others within
that special interest group without having to know even one of
them personally. These newsgroups are generally rather active
and up to date. One will often be able to retrieve information
from them on the latest happenings within that special interest
group, URLs of WWW sites that contain further information on the
special interest area that the group represents, or the text of
FAQs (frequently asked questions along with their answers) that
deal with the interest area at hand.
This fact make newsgroups a powerful tool for explicitly requesting,
or else browsing through to find information that one might require
about the special interest areas covered in the newsgroups. This
is especially true when the information may be too new to be in
print, may require special hands-on experience that only a few
people might have but which wouldn't normally be formally documented,
or when one just wants to get a broad cross-section of opinion
on a particular subject. However, newsgroups also have their drawbacks.
First of all, although newsgroups are broken down into special
interest areas, there is no mechanism in place, other than the
outrage of those who subscribe to it, to ensure that all articles
that are posted to a group actually have anything to do with that
group. In our use of newsgroups to find information on our presentation
topic, it did not take long for us to see that unrelated articles
are indeed sometimes posted to groups, often by inexperienced
users who made an honest mistake, but just as often by individuals
who have made the postings purposely. This problem seems to be
less related to groups that are dedicated to professional or academic
pursuits, but it is still there to some extent.
Secondly, the more active the newsgroup is, the more information
there will be for the individual to filter through in finding
what they are looking for. The fact that discussions are often
broken down into threads of related articles, usually consisting
of the initial posting of information or the request for information
and then any responses to that posting, does help in the filtering
process, but not greatly. One often has to interpret the meanings
of obscure thread titles, or else browse through nearly all of
the articles in order to get an idea of what information might
be available within a given group.
Thirdly, newsgroups are far from exhaustive in the interest areas
that they cover. As well, depending on who maintains the list
of groups available to the news server that one has access to,
existing news groups may even be unavailable to an individual
and they may have to go through the further step of requesting
that it be carried locally, or else attempt to find some alternate
method by which they can obtain the information. For example,
in our attempts to find information on Digital Libraries, only
one group was found that appeared to have any relevance at all
to the topic. However, on the topic of Electronic Publishing,
especially in the area of HTML, there were a number of relevant
groups.
Lastly, if one is requesting information, then one has to recognize
the fact that there is often a time lag between when one posts
the request and when a response is issued, depending on the difficulty
of the question and the amount of activity within the group. Furthermore,
information given in response to a posting that may be of interest
to other individuals is often not seen in the newsgroup itself,
as posters often give their e-mail address as the place to respond
to, rather than indicating that they will look for the answer
within the newsgroup itself.
In summary then, newgroups are generally a good starting point
for research into a topic. They give the reader an idea of the
current issues at hand in that special interest community and
also expose them to many of the catch-phrases and jargon that
exist within that community. This exposure can often be of great
value when the reader moves on to use other methods of obtaining
information on the Internet like search engines, because they
would have often been exposed to those phrases that are unique
to, or at least appropriate for searching for information on their
topic of choice.
Newsgroup Log
Electronic Mail
Electronic mail, or e-mail as it is more commonly called, was
another tool that we used as we researched our topic. It is a
good, but sometimes forgotten form of requesting information over
the Internet. Like newgroups, using e-mail allows one to request
that others send you information about specific topics rather
than having to search all over the Internet in the hope that the
information you seek will be contained on a Web site. However,
unlike posting to newsgroups, an individual sending an e-mail
message knows who the intended recipient will be, and also that
there will likely be only one, or at best, a small number of recipients.
This means that one has to be selective in whom they send messages
to, and they need to know something about the intended recipients
before they send the e-mail. For example, one would want to have
some confidence that an intended recipient has, or at least purports
to have the required knowledge that would allow them to answer
an e-mailed request for information.
One way to determine if an individual may have the knowledge within
an interest area to answer a question e-mailed to them that we
found was to find e-mail addresses from people who had posted
related information to newsgroups related to aspects of the research
topic. We reasoned that, if they demonstrated knowledge on a subject
closely related to the subject that we were looking for information
on, then there was a high probability that they would have knowledge
on our area of research as well.
A second, but just as effective way to find e-mail addresses that
we discovered was to retrieve them from WWW pages. Often, people
discuss areas of interest on their Web pages, and they also often
either embed Hyperlinks to their Internet e-mail address, or at
least make it part of the content of their pages in the hope that
they will receive comments or suggestions from the general public
on the things that are on their Web pages. Again, as in the case
of newsgroups, if the information discussed was similar to or
was on the same topic as the information that we were looking
for, we felt that they would potentially be a good source of information
for the area we were researching.
These are not, however, guaranteed methods of finding individuals
who have the information that one is seeking. Sometimes the individual
that one is e-mailing does indeed have related knowledge, but
doesn't have the information that is being sought. Sometimes the
e-mail address on the Web page is for an automatic responder that
sends back a polite message of thanks for the interest shown in
their site, along with a generic overview of what they think most
people would be asking about, but no information on the topic
that information was sought on. Sometimes the individual whose
e-mail address is given on the Web page is only a maintainer who
provides the service of designing and formatting the Web page
for information given to them by others, but they themselves have
no indepth knowledge of the subject that information is sought
on.
Another drawback in the use of e-mail is the fact that there can
often be a long time lag between when the message is sent and
when the individual that received it responds. In most cases,
if the e-mail address that the message is sent to is invalid,
the sender will receive back an error message telling them that
the mail was not delivered. In this case, the sender will obviously
not expect a response. However, if no such error message is received
back, then the sender will likely assume that the message got
through. From that point, however, it is a waiting game. In one
case of e-mail that we sent to Time magazine in to get more information
about Electronic Publishing, it took about three days for us to
receive a response. Other e-mail that we have sent have been gone
for over three days and have, as of yet, still not elicited any
reply from the recipient.
In summary, e-mail can be an effective tool for getting very specific
information from specific individuals. When the correct individual
is found to request information from, and that person replies,
the information that is received is second to none, as it is tailor-made
to the request initially sent. However, finding the correct individual
can be a difficult process, and, as we found, even once a potentially
good resource individual is sent e-mail, there is no guarantee
that they will respond quickly to a request, if they respond at
all.
Electronic Mail Log
FTP, Archie and Gopher
FTP, Archie and Gopher are some of the original tools that were
developed to transfer files and search through the archives that
make up the Internet. Each of them can be and have been used as
standalone products. However, with the advent of the WWW, and
in particular the use of HTML, Multi-Purpose Internet Mail Extensions
(MIME) and the HTTP protocol, these tools are quickly becoming
outmoded, and their protocols are more likely to be referenced
from within a WWW browser application. In fact, in our research,
we found no reason to access any of these tools directly. Rather,
it seemed to be the case that these tools would be referenced
within a Web page, and that the Web browsers that we were using
would then handle the FTP, Archie, or Gopher commands that would
be required to access the information pointed to.
That is not to say that these tools are not useful, or that they
are no longer used. We discovered that Microsoft, for example,
uses an FTP site as a repository for software like their Internet
Assistant for Word, which is an HTML authoring tool. Maclean's
Magazine uses a gopher site to archive its back issues, and large
numbers of Digital Libraries on the Internet are also stored on
gopher sites. The functions that these tools formerly performed
on their own have now been encapsulated into other, higher level
protocols that are now in use on the WWW. One reason for this,
we suspect, is because of the relative difficulty that using these
tools presented when compared to using the HTTP protocol on the
WWW. Each of these tools has their own interface and commands,
and these are not in any way seamless. In contrast, using HTML
tags within Web pages to access these same tools, requires the
user to know only how to click on a Hyperlink in order to invoke
all of the required commands. The rest of the process is left
up to the HTML programmer and the HTTP protocol. In fact, in the
above examples of FTP and Gopher use, files are accessed via Hyperlinks
on corporate Web pages.
It is perhaps best that this is the manner in which these sites
are now being accessed, as they are not easy to locate on their
own. Unless one knows the exact Internet address where the site
can be located, finding one can be nearly impossible unless one
stumbles upon pre-existing links between these various sites,
or textual information within an existing site that directs one
to other similar sites. Once one has located one of these sites
and is viewing the listings of files available to view or download,
it is quickly discovered that the files often come with little
or no descriptors to describe what they are or what they might
be used for. Thus, one often has to download or view many of the
files on one of these sites in order to find something that is
useful. This often translates into long periods of time expended
for no measurable gains.
In summary, tools like FTP, Archie and Gopher are no longer the
most commonly-used tools on the Internet. However, they continue
to survive on the Internet, but accessing them is now often done
through Hyperlinks on Web pages which allow for the inclusion
of more descriptor data for each file that can be accessed. This
approach seems to work well for all involved, as it allows existing
sites to be able to continue to be accessed without forcing their
maintainers to migrate them to new formats. As well, the increasing
numbers of non-technical and casual users on the Internet are
able to seamlessly use Web browsers to find information in all
of its forms on the Internet without having to be aware of the
fact that they are utilizing these tools.
FTP, Archie, and Gopher Log
World Wide Web
The World Wide Web (WWW) primarily consists of the idea of using
the HyperText Markup Language (HTML) to prepare Web pages that
can be stored on the Internet and that can be viewed by Web browsers.
These browsers find these files by means of the HTTP protocol,
read them in, interpret the embedded commands within them, and
then display those Web pages to the end user. Along with these
features of the WWW came the advent of WWW search engines which
maintain large databases of Web pages and their contents. These
engines allow an individual to submit a search based on keywords
and return Universal Resource Locators (URLs) to those pages whose
contents or titles satisfy the search criteria. In our research,
we found that the WWW, and in particular, the Yahoo, Alta Vista,
and Lycos search engines were especially useful in helping us
locate information.
Each of the above search engines has a different method of searching
the WWW to register the URLs of existing pages, each of them access
different databases, each of them use different search methods
to find those items matching the criteria submitted, and each
of them present the results of a search to the individual that
submitted the search in different ways. These facts are both a
drawback and a strength of the WWW search engines.
One drawback is the fact that one cannot be certain that they
will receive authoritative results regarding the pages that exist
on the subject the search covered. This is because the search
engine chosen may not have catalogued all of the pages on the
subject that exist. As well, if one does not submit a search that
uses keywords that are jargon almost exclusively unique to the
subject area information is sought about, the number of returned
Web pages that match the criteria may be physically impossible
to search through in a reasonable amount of time. If too specialized
of a term is used, or the combination or order of keywords is
not conducive to a good search in relation to the topic, anywhere
from no results through to a such a large number of results from
many more topics than the one sought may be returned. In both
of these extremes however, the final result is that the individual
searching is left with the impression that there is no information
available on the area that they are searching for.
One method that we found for circumventing some of these problems
was to use two or three of the largest search engines simultaneously.
By entering similar, or even identical keyword searches and then
comparing the results, one can often find the best sites for a
given topic, or else one can stumble upon words or phrases to
use in subsequent searches that will narrow down the results to
make them more useful. And, while using WWW search engines to
locate sites of interest is a good starting point in using the
WWW as a research tool, it is far from the end of the process.
Once a good site is found, going to view it often yields links
to other related sites with information on them that one either
would never have thought of looking for, or that one did not even
realize existed. For example, in looking for information about
HTML tags, the HTML Writer's Guild page was found. From this page,
links to style sheets and even pages giving advice on rates to
charge for creating HTML pages commercially were found. In fact,
it was discovered that quite often, one would begin their search
for data on the WWW by going to a page that was returned as a
result of a search on a WWW search engine, and then would be able
to follow links from page to page locating information related
to the topic and, when either the thread of development along
the topic line was exhausted or, more likely, the time available
to research the topic had expired, one found themselves twenty
or thirty links away from the initial site.
The fact that the WWW uses the HTML language as a standard for
its interface also makes the WWW very user-friendly and almost
transparent to the user because of the ease with which one can
traverse it. This fact made it very easy for us to concentrate
on finding the information that we were looking for, and allowed
us to virtually forget about the tools that we were using to get
the information. As well, as was mentioned earlier, the fact that
the HTTP protocol allows one to access sites that store information
in FTP, Gopher, and Archie formats without leaving this interface
means that all areas of the Internet are accessible to users of
the WWW, a fact that only increases its powerfulness.
In summary, the WWW is by far the most powerful tool that we found
on the Internet for accessing information pertinent to our research.
The vastness of its resources, the ease of searching through those
resources, and the ubiquitous nature of the WWW on the Internet
all contribute to its power and usefulness. The wide availability
of tools for navigating through it also makes it a desirable tool
for research. It is no wonder that is has replaced the earlier
tools as the primary means by which information and files are
presented and moved around on the Internet.
World Wide Web Log
Chris Kliewer - Kliewerc@cpsc.ucalgary.ca
Merlin Griscowsky - Merlin@cpsc.ucalgary.ca
Simon Yung - Yung@cpsc.ucalgary.ca