Library System Evaluation

How the Testing Plan Worked

As outlined in the Detailed Design Document, the 6 non-coders were assigned to the testing team, consisting of a Head Tester (Andrew Tang) and five test members. Since there were three major sections, it was decided amongst this subgroup that two testers were assigned a section, each of Book Adminstration, Borrower Adminstration, and Transactions. Andrew and Andy were handled the latter. Geoff and Dan took up the former, and Joel and Pei-Ling's assumed the Borrower Adminstration's functions. Each of these three subgroups, then, were to receive the latest copy of the program and go through their assigned modules, checking judiciously for bugs, errors, omissions, or anything that does not seem right. More importantly, they were to compare the working module against the customer's specifications. Any inconsistencies or discrepancies were written down in a report. This was emailed to the Head Coder, Laura, and she rounded up her coders to re-implement the code, accounting for these errors.

To allow as much time as possible for the coders to fix up the program, the testers' time schedule were severely constrained. From the Sunday that the code was shipped to the testers, until the Thurday that they needed all the bug reports submitted, the testing schedule was somewhat hurried. This situation was already discussed in the Testing Plan in the Detailed Design Document, and the bug reports on some modules could not be submitted on time. However, discussing this situation with the coders, they were cognizant of some of the bugs already, and thus, said it wasn't crucial if the reports couldn't all be handed in.

Although there were no set hard-and-fast rules rigidly specifying what each tester should use to test each module -- owing to the fact that testing is primarily an subjective exercise in judgement and thus depends on the many traits of the individual tester -- some informal guidelines were specified as to what methods should be used. Specifically, there were five layers of testing that were utilized, based on their appropriateness to the module.

Unit testing, the first layer of testing performed, is the most readily discernible methodology used to see if a certain module works properly. Obviously, if a module such as "Add New Borrower" does not fulfill its duties, then all subsequent layers of functionality built upon this module would be needlessly futile. In fact, this is emphasized even more strongly due to the nature of the Detail Design's division of units -- most of the units corresponds directly to a single transaction or course of action that the librarian would normally do. For instance, a librarian would add a new book to the circulation, or delete an existing book, or handle a book return from a patron, typically done in as a single action; the system has a single module for each of these actions ("Add New Book", "Delete Book", and "Return A Book").

Since unit testing is predicated on the unit's code, then this lends itself to almost entirely "white-box" testing of the module. That is, program statements, loop/branch conditions, and variables are traced thoroughly (through path testing/logic testing) to ensure all conditions are dealt with. (Of course, this is a well-known exaggeration, since it is impossible to find all program conditions, but at least be as rigorous as one can). Moreover, error conditions, boundary conditions, and data structure integrity are kept in mind during coding development ("on the fly testing") Thus, when the coder creates a dialogue box for the user to enter information, the module must make sure that all syntax errors such as formatting and illegal character ranges are checked prior to being sent to a database file that it might modify (to preserve data integrity). Another example would be that all sequences of user-selectable buttons be explored to satisfy loop and path conditions.

In addition, tests conducted at this level but at the same time independent from the coder would be module interface, nature of error messages, and ranges of input. Since each of our system's modules are rather large and complex, as well ass requiring a great deal of user interaction, then the flow of the interface must be smooth. For example, the "Search for a Book" module requires the user to navigate and choose among one of the 4 windows presented -- call number, title, author, subject. These windows are search queries, whereby each character the user types in one of these causes another window, the results window, to move to the next item in the database that matches the substring entered. For example, if a user hits the letter 'r' in the subjects field, then the output box will immediately jump down to the first subject that begins with 'r'. Now that in itself is fraught with interface concerns (ie. making sure the proper window becomes active as the user mouse-overs or tabs over to it, the windows cannot be moved or deleted, whether the output box can be resized for a better view, etc.) Thus, the module's interface must be tested with all these issues and interact with the user smoothly. Error messages can be tested by an independent tester and see if they're helpful towards clearing up a misguided action, or whether they're contextually appropriate. For example, if the librarian updates a book, then all syntax errors should be followed by a short display of the correct format, or if call number entered does not match any number currently in the system, then check to see if a more appropriate message then "Invalid call number" is displayed (such as "Call number entered does not exist on our system!"). And lastly, the independent tester can feed all sorts of input to the system (eg. "Ctrl-Alt-Shift-F7-Numlock-Numlock-", "Ctrl-C")

Next, we have integration testing. In unit testing, where a single coder that was responsible for the module ensured consistency and coherency of the module's design, integration testing attempts to combine the interaction of all the modules, coded by different coders whose ideas, strategies, carefullness, etc., are not always on the same wavelength. This is most evident in the realization of the fact that most, if not all the modules read and modify the three data stores. Although proper design of the system should reduce these interfacing issues, testing each module as it is being integrated into a unified system is of paramount importance. Thus, all three types of testing strategies must be used. Top-level walkthroughs, where the tester goes through each module, entering input as needed, then minimize or puts that module in the background or exit that module, and then invokes another module and runs it through, are necessary to see if one module affects another inappropriately. Since the modules in our system are divided into 3 main categories ("Book Administration", "Borrower Administration", and "Transactions"), then modules in each category should be tested extensively in relation to each other. Black-box testing, or input/output testing, will be used most often to ascertain how the modules affect each other. For example, given a set of database files of borrowers, books, and fines, the tester will run "Lend a book", which will associate the call number of the book with the id number of a borrower, as well as remove the book from circulation, then exit the module. Then the tester will run the "Update a Book" function on that call number, and see if it is an illegal action. Thus, modifications to a global structure by a module should be reflected in another module.

Functional testing, or validation testing, attempts to compare the completed system as coded by the developers, to the actual requirements of the customers. Needless to say, this is always a difficult challenge to meet, as even the customers are not entirely sure of what they want until they actually sit down with the system and put it through its paces. This is very true in our situation, where librarians are used to a certain routine, on a certain system, and oftentimes expect the same in a new system for them. Thus, walkthroughs are the dominant testing strategy, as this type of testing is not done (generally) by the members of the supplier party, but rather, the decision of acceptability is at the whim of the customer, which introduces more uncontrollable factors into the mix, such as the experience of the librarians, their individual personalities, their personal way of doling things, and even how they feel that day. On the other hand, testers on the supplier party (and not necessarily the coders) can do what they can to have cognitive walkthroughs of the system and try to spot as many empirical errors as they can before handing it over for the customers to test it. Some things to be cognizant of: testers should reread the final specifications that the customers have agreed to, as well as the requirements that they wanted, and go through the system, through each module, and meticulously compare each item they see on-screen with the specifications/requirements document. They should be attuned to any discrepancies they catch. If this sounds very general and vague, it is because there is no set "tried and true" strategy, trying to get into the librarian's frame of mind (although some research by the testers prior to the testing, such as talking to librarians and experiencing library routines, would help). Thus, this type of testing, as perhaps the essence of walkthroughs, is governed by the individual's personality, knowledge, intelligent, shrewdness, etc. Of course, as aforementioned, testing each modules conformance to a requirement is way to go about it.

Acceptance testing/system testing/performance testing will the last step, and will be performed in conjunction with the supplier group, as it is impossible to accurately duplicate the real-life conditions of a library system, with actual librarians using it. However, as part of the Training Package we specified we'd provide, Nexus will help ease the integration to this new system by a number of methods. First, we'll introduce the system into a small domain of users (a small set of librarins, books, borrowers, etc. to make error detection easier to manage), and gradually expand to incoporate a larger sphere of the actual environment. Thus, we will initially have a database of say, 100 books, with 10 borrowers, and see how that works. This makes it easy to stress-test the system, such as overloading it with more users, and more books (to test the real-time performance of searching algorithms), and derive acceptability standards (which will be a combination of a librarian saying "yes, that is ok" and the contraints of the hardware the system runs on). There will be little by the way of security testing, as this system as previously defined, will offer minimal security against "corrupt end-users". Also, recovery tests will be implemented, as input will be entered to break, or crash, the system, and see how error messages, memory faults, etc. will be handled and how the librarians respond to it. A criteria that we can use to judge that a system recovered gracefully from a fall is that if the system doesn't obliterate the database its writes to, or corrupts the files, or displays error messages that acknowledges the system is down, then it will considered "repairable".



Back to the Evaluation Index