Rob Kremer

UofC

Practical Software Engineering


Cost and Effort Estimation

Project Costs

Software project managers are responsible for controlling project budgets so, they must be able to make estimates of how much a software development is going to cost.

The principal components of project costs are:

The dominant cost is the effort cost.

This is the most difficult to estimate and control, and has the most significant effect on overall costs.

Software costing should be carried out objectively with the aim of accurately predicting the cost to the contractor of developing the software.

Software cost estimation is a continuing activity which starts at the proposal stage and continues throughout the lifetime of a project. Projects normally have a budget, and continual cost estimation is necessary to ensure that spending is in line with the budget.

Effort can be measured in staff-hours or staff-months (Used to be known as man-hours or man-months).

Boehm (1981) discusses seven techniques of software cost estimation:

(1) Algorithmic cost modeling A model is developed using historical cost information which relates some software metric (usually its size) to the project cost. An estimate is made of that metric and the model predicts the effort required.
(2) Expert judgement One or more experts on the software development techniques to be used and on the application domain are consulted. They each estimate the project cost and the final cost estimate is arrived at by consensus.
(3) Estimation by analogy This technique is applicable when other projects in the same application domain have been completed. The cost of a new project is estimated by analogy with these completed projects.
(4) Parkinson's Law Parkinson's Law states that work expands to fill the time available. In software costing, this means that the cost is determined by available resources rather than by objective assessment. If the software has to be delivered in 12 months and 5 people are available, the effort required is estimated to be 60 person-months.
(5) Pricing to win The software cost is estimated to be whatever the customer has available to spend on the project. The estimated effort depends on the customer's budget and not on the software functionality.
(6) Top-down estimation A cost estimate is established by considering the overall functionality of the product and how that functionality is provided by interacting sub-functions. Cost estimates are made on the basis of the logical function rather than the components implementing that function.
(7) Bottom-up estimation The cost of each component is estimated. All these costs are added to produce a final cost estimate.

Each technique has advantages and disadvantages.

For large projects, several cost estimation techniques should be used in parallel and their results compared.

If these predict radically different costs, more information should be sought and the costing process repeated. The process should continue until the estimates converge.

Cost models are based on the fact that a firm set of requirements has been drawn up and costing is carried out using these requirements as a basis.

However, sometimes the requirements may be changed so that a fixed cost is not exceeded.

Algorithmic Cost Modeling

Costs are analyzed using mathematical formulae linking costs with metrics.

The most commonly used metric for cost estimation is the number of lines of source code (LOC) in the finished system (which of course is not known).

Size estimation may involve estimation by

Code size estimates are uncertain because they depend on hardware and software choices, use of a commercial database management system etc.

An alternative to using code size as the estimated product attribute is the use of `function- points', which are related to the functionality of the software rather than to its size.

Function points are computed by counting the following software characteristics:

Each of these is then individually assessed for complexity and given a weighting value which varies from 3 (for simple external inputs) to 15 (for complex internal files).

The function point count is computed by multiplying each raw count by the estimated weight and summing all values, then multiplied by the project complexity factors which consider the overall complexity of the project according to a range of factors such as the degree of distributed processing, the amount of reuse, the performance, and so on.

1-5 Data Element Types6-19 Data Element Types20+ Data Element Types
0-1 File Type ReferencedLowLowAverage
2-3 File Type ReferencedLowAverageHigh
4+ File Type ReferencedAverageHighHigh

LowAverageHigh
External Inputx3x4x6
External Outputx4x5x7
Logical Internal Filex7x10x15
External Interface Filex5x7x10
External Inquiryx3x4x6

Function point counts can be used in conjunction with lines of code estimation techniques.

The number of function points is used to estimate the final code size.

Based on historical data analysis, the average number of lines of code in a particular language required to implement a function point can be estimated (AVC). The estimated code size for a new application is computed as follows:

Code size = AVC x Number of function points

The advantage of this approach is that the number of function points can often be estimated from the requirements specification so an early code size prediction can be made.

Levels of selected software languages relative to Assembler language


Mathematical Estimation Models

The Rayleigh-Putnam Curve

Uses a negative exponential curve as an indicator of cumulative staff-power distribution over time during a project.

Technology constant, C, combines the effect of using tools, languages, methodology, quality assurance procedures. standards etc. It is determined on the basis of historical data (past projects). C is determined from project size, area under effort curve, and project duration.

Rating: C = 2000 -- poor, C = 8000 -- good, C = 11000 it is excellent.

e.g. Assume C=4000; size estimate = 200,000 LOC.

Effort and productivity change when development time varies between 2 and 3 years:


Regression Models

COCOMO

Most widely used model for effort and cost estimation.

Considers a wide variety of factors.

Projects fall into three categories: organic, semidetached, and embedded, characterized by their size.

Project TypeCharacteristics
SizeInnovationDeadline/ ConstraintsDev. Environment
OrganicSmallishLittleNot tightStable
EmbeddedLargeGreaterTightConmplex hardware/
custom interfaces
SemidetachedMediumMediumMediumMedium

In the basic model which uses only source size:

e.g.

There is also an intermediate model which, as well as size, uses 15 other cost drivers.

Cost Drivers for the COCOMO Model.

OrganicSemidetachedEmbedded a3.23.02.8 b1.051.121.20

e.g.

The intermediate model is more accurate than the basic model.

Comparison:

Automated Estimation Tools

Automated estimation tools allow the planner to estimate cost and effort and to perform "what if" analyses for important project variables such as delivery date or staffing.

All have the same general characteristics and require:

  1. A quantitative estimate of project size (e.g., LOC) or functionality
    (function point data)
  2. Qualitative project characteristics such as complexity, required reliability, or business criticality
  3. Some description of the development staff and/or development environment
From these data, the model implemented by the automated estimation tool provides estimates of the effort required to complete the project, costs, staff loading, and, in some cases, development schedule and associated risk.

BYL (Before You Leap) developed by the Gordon Group,

WICOMO (Wang Institute Cost Model) developed at the Wang Institute, and DECPlan developed by Digital Equipment Corporation
are automated estimation tools that are based on COCOMO.

Each of the tools requires the user to provide preliminary LOC estimates.

These estimates are categorized by programming language and type
(i.e., adapted code, reused code, new code).

The user also specifies values for the cost driver attributes.

Each of the tools produces estimated elapsed project duration (in months), effort in staff-months, average staffing per month, average productivity in LOC/pm, and cost per month.

This data can be developed for each phase in the software engineering process individually or for the entire project.

SLIM is an automated costing system based on the Rayleigh-Putnam Model.

SLIM applies the Putnam software model, linear programming, statistical simulation, and program evaluation and review technique, or PERT (a scheduling method) techniques to derive software project estimates.

The system enables a software planner to perform the following functions in an interactive session:

(1) calibrate the local software development environment by interpreting historical data supplied by the planner;

(2) create an information model of the software to be developed by eliciting basic software characteristics, personal attributes, and environmental considerations; and

(3) conduct software sizing--the approach used in SLIM is a more sophisticated, automated version of the LOC costing technique.

Once software size (i.e., LOC for each software function) has been established, SLIM computes size deviation (an indication of estimation uncertainty), a sensitivity profile that indicates potential deviation of cost and effort, and a consistency check with data collected for software systems of similar size.

The planner can invoke a linear programming analysis that considers development constraints on both cost and effort, and provides a month-by-month distribution of effort, and a consistency check with data collected for software systems of similar size.

ESTIMACS is a "macro- estimation model" that uses a function point estimation method enhanced to accommodate a variety of project and personnel factors.

The ESTIMACS tool contains a set of models that enable the planner to estimate

  1. system development effort,
  2. staff and cost,
  3. hardware configuration,
  4. risk,
  5. the effects of "development portfolio."
The system development effort model combines data about the user, the developer, the project geography (i.e., the proximity of developer and customer), and the number of "major business functions" to be implemented with information domain data required for function point computation, the application complexity, performance, and reliability.

ESTIMACS can develop staffing and costs using a life cycle data base to provide work distribution and deployment information.

The target hardware configuration is sized (i.e., processor power and storage capacity are estimated) using answers to a series of questions that help the planner evaluate transaction volume, windows of application, and other data.

The level of risk associated with the successful implementation of the proposed system is determined based on responses to a questionnaire that examines project factors such as size, structure, and technology.

SPQR/20, developed by Software Productivity Research, Inc. has the user complete a simple set of multiple choice questions that address:

In addition to output data described for other tools, SPQR/20 estimates: Each of the automated estimating tools conducts a dialog with the planner, obtaining appropriate project and supporting information and producing both tabular and (in some cases) graphical output.

All these tools have been implemented on personal computers or engineering workstations.

Martin compared these tools by applying each to the same project.

A large variation in estimated results was encountered, and the predicted values sometimes were significantly different from actual values.

This reinforces the fact that the output of estimation tools should be used as one "data point" from which estimates are derived--not as the only source for an estimate.


References

Boehm, B. W. (1981). Software Engineering Economics. Englewood Cliffs, N.J., Prentice-Hall.

Pressman, R. S. (1997). Software Engineering: A Practitioner's Approach (4th edition). New York, McGraw-Hill. (chapter 7).


UofC Practical Software Engineering, Department of Computer Science

Rob Kremer