Practical Software Engineering

Project Costs

Software project managers are responsible for controlling project budgets so, they must be able to make estimates of how much a software development is going to cost.

The principal components of project costs are:

Hardware costs.
Travel and training costs.
Effort costs (the costs of paying software engineers).

The dominant cost is the effort cost.

This is the most difficult to estimate and control, and has the most significant effect on overall costs.

Software costing should be carried out objectively with the aim of accurately predicting the cost to the contractor of developing the software.

Software cost estimation is a continuing activity which starts at the proposal stage and continues throughout the lifetime of a project. Projects normally have a budget, and continual cost estimation is necessary to ensure that spending is in line with the budget.

Effort can be measured in staff-hours or staff-months (Used to be known as man-hours or man-months).

Boehm (1981) discusses seven techniques of software cost estimation:

(1) Algorithmic cost modeling A model is developed using historical cost information which relates some software metric (usually its size) to the project cost. An estimate is made of that metric and the model predicts the effort required.
(2) Expert judgement One or more experts on the software development techniques to be used and on the application domain are consulted. They each estimate the project cost and the final cost estimate is arrived at by consensus.
(3) Estimation by analogy This technique is applicable when other projects in the same application domain have been completed. The cost of a new project is estimated by analogy with these completed projects.
(4) Parkinson's Law Parkinson's Law states that work expands to fill the time available. In software costing, this means that the cost is determined by available resources rather than by objective assessment. If the software has to be delivered in 12 months and 5 people are available, the effort required is estimated to be 60 person-months.
(5) Pricing to win The software cost is estimated to be whatever the customer has available to spend on the project. The estimated effort depends on the customer's budget and not on the software functionality.
(6) Top-down estimation A cost estimate is established by considering the overall functionality of the product and how that functionality is provided by interacting sub-functions. Cost estimates are made on the basis of the logical function rather than the components implementing that function.
(7) Bottom-up estimation The cost of each component is estimated. All these costs are added to produce a final cost estimate.

Each technique has advantages and disadvantages.

For large projects, several cost estimation techniques should be used in parallel and their results compared.

If these predict radically different costs, more information should be sought and the costing process repeated. The process should continue until the estimates converge.

Cost models are based on the fact that a firm set of requirements has been drawn up and costing is carried out using these requirements as a basis.

However, sometimes the requirements may be changed so that a fixed cost is not exceeded.

Algorithmic Cost Modeling

Costs are analyzed using mathematical formulae linking costs with metrics.

The most commonly used metric for cost estimation is the number of lines of source code (LOC) in the finished system (which of course is not known).

Size estimation may involve estimation by

analogy with other projects,
estimation by ranking the sizes of system components and using a known reference component to estimate the component size or
may simply be a question of engineering judgement.

Code size estimates are uncertain because they depend on hardware and software choices, use of a commercial database management system etc.

An alternative to using code size as the estimated product attribute is the use of `function- points', which are related to the functionality of the software rather than to its size.

Function points are computed by counting the following software characteristics:

External inputs and outputs.
User interactions.
External interfaces.
Files used by the system.

Each of these is then individually assessed for complexity and given a weighting value which varies from 3 (for simple external inputs) to 15 (for complex internal files).

The function point count is computed by multiplying each raw count by the estimated weight and summing all values, then multiplied by the project complexity factors which consider the overall complexity of the project according to a range of factors such as the degree of distributed processing, the amount of reuse, the performance, and so on.

1-5 Data Element Types 6-19 Data Element Types 20+ Data Element Types
0-1 File Type Referenced Low Low Average
2-3 File Type Referenced Low Average High
4+ File Type Referenced Average High High

	1-5 Data Element Types	6-19 Data Element Types	20+ Data Element Types
0-1 File Type Referenced	Low	Low	Average
2-3 File Type Referenced	Low	Average	High
4+ File Type Referenced	Average	High	High

Low Average High
External Input x3 x4 x6
External Output x4 x5 x7
Logical Internal File x7 x10 x15
External Interface File x5 x7 x10
External Inquiry x3 x4 x6

	Low	Average	High
External Input	x3	x4	x6
External Output	x4	x5	x7
Logical Internal File	x7	x10	x15
External Interface File	x5	x7	x10
External Inquiry	x3	x4	x6

Function point counts can be used in conjunction with lines of code estimation techniques.

The number of function points is used to estimate the final code size.

Based on historical data analysis, the average number of lines of code in a particular language required to implement a function point can be estimated (AVC). The estimated code size for a new application is computed as follows:

Code size = AVC x Number of function points

The advantage of this approach is that the number of function points can often be estimated from the requirements specification so an early code size prediction can be made.

Levels of selected software languages relative to Assembler language

Language	Ratio-Source:Executable
Assembler	1:1
Macro-Assembler	1:1.5
C	1:1.25
ALGOL	1:3
COBOL	1:3
FORTRAN	1:3
Pascal	1:3.5
RPG	1:4
PL1	1:4
MODULA-2	1:4.5
Ada	1:5
PROLOG	1:5
LISP	1:5
FORTH	1:5
BASIC	1:5
LOGO	1:6
4-GLs	1:8
APL	1:9
Objective-C	1:12
SmallTalk	1:15
Query Languages	1:20
Speadsheet	1:50

Mathematical Estimation Models

The Rayleigh-Putnam Curve

Uses a negative exponential curve as an indicator of cumulative staff-power distribution over time during a project.

Technology constant, C, combines the effect of using tools, languages, methodology, quality assurance procedures. standards etc. It is determined on the basis of historical data (past projects). C is determined from project size, area under effort curve, and project duration.

Technology constant	C = size * B ^1/3 * T ^4/3
Total liftime effort (staff-months)	B = (1/T⁴)(size/C)³
Development effort	E = .3945 B
Required development time (years)	T
Size estimate in LOC	size

Rating: C = 2000 -- poor, C = 8000 -- good, C = 11000 it is excellent.

e.g. Assume C=4000; size estimate = 200,000 LOC.

Total lifetime effort	B = (1/T⁴)(200,000/4000)³ = (1/T⁴)(50)³
Devleopment Effort	E = .3945 B

Effort and productivity change when development time varies between 2 and 3 years:

T	E	B
2	3082	7814
2.5	1262	3200
3	609	1543

Regression Models

COCOMO

Most widely used model for effort and cost estimation.

Considers a wide variety of factors.

Projects fall into three categories: organic, semidetached, and embedded, characterized by their size.

Project Type Characteristics
Size Innovation Deadline/ Constraints Dev. Environment
Organic Smallish Little Not tight Stable
Embedded Large Greater Tight Conmplex hardware/
custom interfaces
Semidetached Medium Medium Medium Medium

Project Type	Characteristics
Size	Innovation	Deadline/ Constraints	Dev. Environment
Organic	Smallish	Little	Not tight	Stable
Embedded	Large	Greater	Tight	Conmplex hardware/ custom interfaces
Semidetached	Medium	Medium	Medium	Medium

In the basic model which uses only source size:

	Organic	Semidetached	Embedded
a	2.4	3.0	3.6
b	1.05	1.12	1.20

Mode	Effort Formula
Organic	E = 2.4 * (size^1.05)
Semidetached	E = 3.0 * (size^1.12)
Embedded	E = 3.6 * (size^1.20)

e.g.

Organic	E = 2.4 * (200^1.05) = 626 staff-months
Semidetached	E = 3.0 * (200^1.12) = 1133 staff-months
Embedded	E = 3.6 * (200^1.20) = 2077 staff-months

There is also an intermediate model which, as well as size, uses 15 other cost drivers.

Cost Drivers for the COCOMO Model.

Software reliability
Size of application database
Complexity
Analyst capability
Software engineering capability
Applications experience
Virtual machine experience
Programming language expertise
Performance requirements
Memory constraints
Volatility of virtual machine
Environment
Turnaround time
Use of software tools
Application of software engineering methods
Required development schedule
Values are assigned by the manager.

OrganicSemidetachedEmbedded a3.23.02.8 b1.051.121.20

Mode	Effort Formula
Organic	E = 3.2 * (size^1.05) * C
Semidetached	E = 3.0 * (size^1.12) * C
Embedded	E = 2.8 * (size^1.20) * C

e.g.

C = 0.88 * 1.15 * 1.13 * 0.95 = 1.086

Organic	E = 3.2 * (200^1.05) * 1.086 = 906 staff-months
Semidetached	E = 3.0 * (200^1.12) * 1.086 = 1231 staff-months
Embedded	E = 2.8 * (200^1.20) * 1.086 = 1755 staff-months

The intermediate model is more accurate than the basic model.

Comparison:

Rayleigh-Putnam	E = (.3945 * 1/T⁴)(size³)(1/C)³	Good with large projects where schedule compression is important
COCOMO	E = a * size^b * C	Medium to execellent

Automated Estimation Tools

Automated estimation tools allow the planner to estimate cost and effort and to perform "what if" analyses for important project variables such as delivery date or staffing.

All have the same general characteristics and require:

A quantitative estimate of project size (e.g., LOC) or functionality
(function point data)
Qualitative project characteristics such as complexity, required reliability, or business criticality
Some description of the development staff and/or development environment

From these data, the model implemented by the automated estimation tool provides estimates of the effort required to complete the project, costs, staff loading, and, in some cases, development schedule and associated risk.

BYL (Before You Leap) developed by the Gordon Group,

WICOMO (Wang Institute Cost Model) developed at the Wang Institute, and DECPlan developed by Digital Equipment Corporation
are automated estimation tools that are based on COCOMO.

Each of the tools requires the user to provide preliminary LOC estimates.

These estimates are categorized by programming language and type
(i.e., adapted code, reused code, new code).

The user also specifies values for the cost driver attributes.

Each of the tools produces estimated elapsed project duration (in months), effort in staff-months, average staffing per month, average productivity in LOC/pm, and cost per month.

This data can be developed for each phase in the software engineering process individually or for the entire project.

SLIM is an automated costing system based on the Rayleigh-Putnam Model.

SLIM applies the Putnam software model, linear programming, statistical simulation, and program evaluation and review technique, or PERT (a scheduling method) techniques to derive software project estimates.

The system enables a software planner to perform the following functions in an interactive session:

(1) calibrate the local software development environment by interpreting historical data supplied by the planner;

(2) create an information model of the software to be developed by eliciting basic software characteristics, personal attributes, and environmental considerations; and

(3) conduct software sizing--the approach used in SLIM is a more sophisticated, automated version of the LOC costing technique.

Once software size (i.e., LOC for each software function) has been established, SLIM computes size deviation (an indication of estimation uncertainty), a sensitivity profile that indicates potential deviation of cost and effort, and a consistency check with data collected for software systems of similar size.

The planner can invoke a linear programming analysis that considers development constraints on both cost and effort, and provides a month-by-month distribution of effort, and a consistency check with data collected for software systems of similar size.

ESTIMACS is a "macro- estimation model" that uses a function point estimation method enhanced to accommodate a variety of project and personnel factors.

The ESTIMACS tool contains a set of models that enable the planner to estimate

system development effort,
staff and cost,
hardware configuration,
risk,
the effects of "development portfolio."

The system development effort model combines data about the user, the developer, the project geography (i.e., the proximity of developer and customer), and the number of "major business functions" to be implemented with information domain data required for function point computation, the application complexity, performance, and reliability.

ESTIMACS can develop staffing and costs using a life cycle data base to provide work distribution and deployment information.

The target hardware configuration is sized (i.e., processor power and storage capacity are estimated) using answers to a series of questions that help the planner evaluate transaction volume, windows of application, and other data.

The level of risk associated with the successful implementation of the proposed system is determined based on responses to a questionnaire that examines project factors such as size, structure, and technology.

SPQR/20, developed by Software Productivity Research, Inc. has the user complete a simple set of multiple choice questions that address:

project type (e.g., new program, maintenance),
project scope (e.g., prototype, reusable module),
goals (e.g., minimum duration, highest quality),
project class (e.g., personal program, product),
application type (e.g., batch, expert system),
novelty (e.g., repeat of a previous application),
office facilities (e.g., open office environment, crowded bullpen),
program requirements (e.g., clear, hazy),
design requirements (e.g., informal design with no automation),
user documentation (e.g., informal, formal),
response time,
staff experience,
percent source code reuse,
programming language,
logical complexity of algorithms,
code,
data complexity,
project related cost data (e.g., length of work week, average salary).

In addition to output data described for other tools, SPQR/20 estimates:

total pages of project documentation,
total defects potential for the project,
cumulative defect removal efficiency,
total defects at delivery, and
number of defects per KLOC.

Each of the automated estimating tools conducts a dialog with the planner, obtaining appropriate project and supporting information and producing both tabular and (in some cases) graphical output.

All these tools have been implemented on personal computers or engineering workstations.

Martin compared these tools by applying each to the same project.

A large variation in estimated results was encountered, and the predicted values sometimes were significantly different from actual values.

This reinforces the fact that the output of estimation tools should be used as one "data point" from which estimates are derived--not as the only source for an estimate.

References

Boehm, B. W. (1981). Software Engineering Economics. Englewood Cliffs, N.J., Prentice-Hall.

Pressman, R. S. (1997). Software Engineering: A Practitioner's Approach (4th edition). New York, McGraw-Hill. (chapter 7).

Practical Software Engineering, Department of Computer Science

Rob Kremer

(1) Algorithmic cost modeling	A model is developed using historical cost information which relates some software metric (usually its size) to the project cost. An estimate is made of that metric and the model predicts the effort required.
(2) Expert judgement	One or more experts on the software development techniques to be used and on the application domain are consulted. They each estimate the project cost and the final cost estimate is arrived at by consensus.
(3) Estimation by analogy	This technique is applicable when other projects in the same application domain have been completed. The cost of a new project is estimated by analogy with these completed projects.
(4) Parkinson's Law	Parkinson's Law states that work expands to fill the time available. In software costing, this means that the cost is determined by available resources rather than by objective assessment. If the software has to be delivered in 12 months and 5 people are available, the effort required is estimated to be 60 person-months.
(5) Pricing to win	The software cost is estimated to be whatever the customer has available to spend on the project. The estimated effort depends on the customer's budget and not on the software functionality.
(6) Top-down estimation	A cost estimate is established by considering the overall functionality of the product and how that functionality is provided by interacting sub-functions. Cost estimates are made on the basis of the logical function rather than the components implementing that function.
(7) Bottom-up estimation	The cost of each component is estimated. All these costs are added to produce a final cost estimate.