Practical Software Engineering

Software Quality Assurance

SQA is something which should be applied throughout the development process.

Quality is often measured in thousands of lines of code or mean time to make a change.

Roger Pressman quotes Philip Crosby:

The problem of quality management is not what people don't know about it. The problem is what they think they do know....In this regard, quality has much in common with sex. Everybody is for it. (Under certain conditions, of course.) Everybody feels they understand it. (Even though they wouldn't want to explain it.) Everybody thinks execution is only as matter of following natural inclinations. (After all, we do get along somehow.) And, of course, most people feel that problems in these areas are caused by other people. (If only they would take the time to do things right.)

Software Reliability

Reliability means

issues that related to the design of the product which will operate well for a substantial length of time.
a metric which is the probability of operational success of software.

Probabilistic Models

can refer to deterministic events (e.g. motor burns out) when it cannot be predicted when they will occur; or random events.

The probability space -- the space of all possible occurrences must first be defined, e.g. in a probability model for program error it is all possible paths in a program. Then the rules for selection are specified, e.g. for each path, combinations of initial conditions and input values. A software failure occurs when an execution sequence containing an error is processed.

Reliability Theory

is the application of probability theory to the modeling of failures and the prediction of success probability.

A definition commonly accepted is:

Software reliability is the probability that the program performs successfully, according to specifications, for a given time period.

Specifications -- precise statements of:

the host machine
the operating system and support software
the operating environment
the definition of success
details of hardware interfaces with the machine
details of ranges and rates of I/O data
the operational procedures.

Errors are found from a system failure, and may be: hardware, software, operator, or unresolved.

Time may be divided into:

operating time,
calendar time during operation,
calendar time during development,
man-hours of coding,
development, testing,
debugging,
computer test times.

Different variables from time may need to be considered,

e.g.

load -- on a timesharing system
input cycles -- if data only arives occasionally.

Software is repairable if it can be debugged and the errors corrected. This may not be possible without inconveniencing the user, e.g. air-traffic control system.

Software availability is the probability that the program is performing successfully, according to specifications, at a given point in time.

Availability is defined as:

the ratio of systems up at some instant to the size of the population studied (no. of systems).
the ratio of observed uptime to the sum of the uptime and downtime:

A = . T(up) / (T(up) + T(down)) (of a single system).

These measurements are used to:

quantify A and compare with other systems or goals.
track A over time to see if it increases as errors are found.
plan for repair personnel, facilities (e.g. test time) and alternative service.

If the system is still in the design and development phase then a third definition is used:

the ratio of the mean time to failure (uptimes) and the sum of the mean time to failure and the mean time to repair (downtime):

A = MTTF / (MTTF + MTTR)

Various hypotheses exist about program errors, and seem to be true, but no controlled tests have been run to prove or disprove them:

Bugs per line constant. There are less errors per line in a high level language. Many types of errors in machine code do not exist in HOL.
Memory shortage encourages bugs. Mainly due to programming "tricks" used to squeeze code.
Heavy load causes errors to occur. Very difficult to document and test heavy loads.
Tuning reduces error occurrences rate. This involves removing errors for a class of input data. If new inputs are needed, new errors could occur, and the system (hardware and software) must be retuned.

Further hypotheses about errors:

The normalized number of errors is constant. Normalization is the total number of errors divided by the number of machine language instruction.
The normalized error-removal rate is constant. These two hypotheses apply over similar programs.
Bug characteristics remain unchanged as debugging proceeds. Those found in the first few weeks are representative of the total bug population.
Independent debugging results in similar programs. When two independent debuggers work on a large program, the evolution of the program is such that the differences between their versions is negligible.

Many researchers have put forward models of reliability based on measures of the hardware, the software, and the operator; and used them for prediction, comparative analysis, and development control. Error reliability and availability models provide a quantitative measure of the goodness of the software. There are still many unanswered questions.

Software Quality Evaluation

This is still in its development phase.

Boehm, Brown and Lipow identify key issues, and say measures should show where a program is deficient. Managers must decide on the relative importance of:

on-time delivery
efficient use of resources such as:
processing units
memory
peripheral devices
maintainable code issues such as:
comprehensibility
modifiability
portability

They define a hierarchical software characteristic tree, the arrow indicates logical implication. The lowest level characteristics are combined into medium level characteristics. The lowest level are recommended as quantitative metrics. They define each one. Then they evaluated each by their correlation with program quality, potential benefits in terms of insights and decision inputs for the developer and user, quantifiability, feasibility of automating evaluation. The list is more useful as a check to programmers rather than a guide to program construction.

Gilb also devised a set of software metrics:

reliability -- the probability that a given program operates for a certain time without a logical error
1 - (inputs causing execution failures / total inputs)
maintainability -- the probability that a failed system will be restored to operable condition within a specified time.
repairability -- as maintainability but all resources assumed to be immediately available.
availability
accuracy
precision
flexibility -- with many sub-categories
efficiency
effectiveness

and many more. He defines each in detail. The reality of applying these measures is disheartening. Many are difficult to obtain, and no expected range is given. They are not all independent.

However, this is still a developing field and he has pioneered some software quality measurements.

Halstead used 'methods and principles of classical experimental science'. He counted number of unique operators (IF, DO, = , PRINT) n(1);

unique operands (variables or constants) n(2);

total usage of the operators N(1);

total usage of the operands N(2);

number of times each operator occurred F(1,j) (j=1 ..n(1));

number of times each operand occurred F(2,j) (j=1 ...n(2)).

He defined vocabulary n as n(1)+ n(2) and implementation length N as N(1) + N(2). From these he devised equations for: length, volume, potential volume, boundary volume, program level, intelligence, programming effort...e.g. length equation N(1) = n(1) log2 n(1) + n(2) log2 n(2).

His length equation was tested on 14 algorithms and found to be very close to actual length. Other experimental evidence is also convincing. However, it ignores the issues of variable names, comments, choice of algorithms or data structures. It also ignores the general issues of portability, flexibility, efficiency.

Zak lists five productivity attributes:

correctness
ability to achieve schedules
adaptability
efficiency
freedom from bugs

From a survey of managers and technicians:

quality of external documentation
programming language
availability of tools
programmer experience in data processing
programmer experience in the functional area
effect of project communication
independent modules for individual assignment
well-defined programming practices

In an experiment, five programming teams were given a different objective each:

minimum internal memory
output clarity
program clarity
minimum source statements
minimum hours

When productivity was evaluated each team ranked first in its primary objective. This shows that programmers respond to a goal.

Maintainability

This is the main programming costs in most installations, and is affected by data structures, logical structure, documentation, diagnostic tools, and by personnel attributes such as specialization, experience, training, intelligence, motivation.

Methods for improving maintainability are:

inspections
automated audits of comments
test path analysis programs
use of pseudocode documentation
dual maintenance of source code
modularity
structured program logic flow.

Bugs are sometimes seeded to establish a maintainability measure. For example, a program has 100 seeded bugs. During debugging 550 bugs are found, 50 of which were seeded. It can then be estimated that 500 real bugs remain.

Software Maintenance has very high cost. Gansler (1976) quotes Air Force avionics software at $75/instruction to develop, and $4000/instruction to maintain.

Maintenance includes the cost of rewriting, testing, debugging and integrating new features.

Documentation is one of the items which is said to lead to high maintenance costs. It is not just the program listing with comments. A program librarian must be responsible for the system documentation, but programmers are responsible for the technical writing.

Other aids may be text editors, and Source Code Control System (SCCS) tool for producing records. Some companies insist that programmers dictate any test or changes onto a tape every day.

Problem areas in software maintenance reported by respondents

Rank Problem area

User demands for enhancements, extensions
Quality of system documentation
Competing demands on maintenance personnel time
Quality of original programs
Meeting scheduled commitments
Lack of user understanding of system
Availability of maintenance program personnel
Adequacy of system design specifications
Turnover of maintenance personnel
Unrealistic user expectations
Processing time of system
Forecasting personnel requirements
Skills of maintenance personnel
Changes to hardware and software
Budgetary pressures
Adherence to programming standards in maintenance
Data integrity
Motivation of maintenance personnel
Application failures
Maintenance programming productivity
Hardware and software reliability
Storage requirements
Management support of system
Lack of user interest in system

(Lients et al. (1976), Table V.)

Complexity/Comprehension

Program complexity can be logical, psychological or structural.

Logical Complexity

can make proofs of correctness difficult, long, or impossible, e.g. the increase in the number of distinct program paths.

Psychological Complexity

makes it difficult for people to understand software. This is usually known as comprehensibility.

Structural Complexity

involves the number of modules in the program.

Logical complexity has been measure by a graph-theoretic measure.

McCabe calculates the paths through a program. Each node corresponds to a block of code, each arc to a branch. He then calculatesthe cyclomatic number (maximum number of linearly independent circuits), and controlling the size by limiting this.

He found that he could recognize an individual's programming style by the patterns in the graphs. He also looked at structure, and found that only 4 patterns of 'non-structure' occur in graphs, which are:

branching out of a loop
branching into a loop
branching into a decision
branching out of a decision

In summary, this is a helpful tool in preparing test data, and provides useful information about program complexity. However, it ignores choice of data structures, algorithms, mnemonic variable names, comments, and issues such a portability, flexibility, efficiency.

Structural complexity may be absolute or relative. Absolute structural complexity measures the number of modules, relative structural complexity is the ratio of module linkages to modules. The goal is to minimize the connections between modules over which errors could propogate.

Cohesiveness refers to the relationships among pieces of a module.

Binding is a measure of cohesiveness; goal is high. It could be coincidental, logical, temporal, communicative sequential or functional.

Comprehensibility

This includes such issues as:

high and low-level comments
mnemonic variable names
complexity of control flow
general program type

Sheppard did experiments with professional programmers, types of program (engineering, statistical, nonnumeric), levels of structure, and levels of mnemonic variable names. He found that the least structured program was most difficult to reconstruct (after studying for 25 minutes) and the partially structured one was easiest. No differences were found for mnemonic variable names, nor order of presentation of programs.

Shneiderman suggest as 90-10 rule, i.e. a competent programmer should be able to reconstruct functionally 90% of a program after 10 minutes study (or module if large program). A more casual approach would be 80-20 rule, a more rigorous approach would be 95-5 rule. This is based on memorization/reconstruction experiments.

Quality Assurance

For hardware, this covers inspection and test of materials, maintenance of standards for workmanship, calibration of equipment, acceptance testing.

For software, there is no prototyping (except as phase one of a two-phase design), no incoming parts to be inspected, no standards for measuring software quality.

Rules to follow in software contracting:

Get legal advice from the beginning.
Negotiate with a senior person.
Negotiate with only one person.
Document all verbal agreements.
Make sure the contract specifies everything you will get: the prices, the terms, the conditions.
Do not announce the final decision until the contract is signed.
Remember that no matter what the contract says, success with software depends first of all on a good business relationship between buyer and seller.

A table of contents for a typical requirements document.

Source: Heninger (1979, p. 3). Three additional sections are suggested for this outline: 1(a), "Software Characteristics"; 2(a), "Software Interfaces"; and 6(a), "Defensive Programming Techniques."

It is desirable to add sections:

2A. Software Characteristics which includes design philosophy, language, algorithms, data structures.

3A. Software Interfaces which should discuss decisions on: existing operating systems, compilers, interpreters, assemblers, existing software development tools, existing code modules, subroutines or data bases.

7A. Defensive Programming Techniques which includes expected range of input variables, key intermediate variables, output variables, range checking, parallel computation and checking, rollback, error-recovery techniques.

Specifications

This is a description of how each item in the "requirements" is to be realized. The computer, language, operating system are chosen prior to writing of the specs. It contains details of what must be done, and how, major algorithms, equations, preliminary design in the form of HIPO diagrams (or equivalent). It must be numbered and dated.

Details of metrics for performance, reliability, and quality will be omitted, as will techniques for cost estimation.

Practical Software Engineering, Department of Computer Science

mildred@cpsc.ucalgary.ca 12-Jan-96