Delivery forecasting

14 min read

Velocity vs Throughput: What Actually Predicts Delivery?

Teams have been using a capacity metric to answer a prediction question. Here is what to use instead — and when.

A software delivery team reviewing flow charts, throughput bars, and cycle-time scatter plots on a monitor

Not which metric is better. Which question are you asking?

"When will this be delivered?" sounds like a simple question. Teams have been reaching for velocity to answer it — but velocity was never designed to carry that weight.

Velocity measures completed estimates. Throughput measures completed work. Cycle time measures elapsed time. Monte Carlo turns historical data into probabilistic forecasts. Each answers something different — and the mismatch between the question and the metric is where most delivery conversations go wrong.

The mismatch between question and metric costs teams credibility. This article names the five tools available, shows what each one actually answers, and gives you explicit rules for when to use which.

Why velocity alone is not enough

Velocity is usually calculated by adding up the story-point estimates for completed user stories in an iteration. Agile Alliance describes it this way and notes it can help a team estimate remaining duration — assuming velocity remains approximately stable.

That assumption is doing significant work. Velocity depends on estimates. Estimates depend on local team judgement. Calibration shifts when team composition, work type, quality expectations, or estimation habits change. Agile Alliance also warns that velocity is not a budget, not a forecast, and should not be compared across teams.

Throughput is different: it counts the work items finished per unit of time. Kanban and flow sources treat it as an observed delivery-system metric rather than a derived estimate-based one. The problem is not that velocity is wrong. It is that teams often use a capacity metric to answer a delivery predictability question — and those are not the same thing.

The core claim stated plainly

Throughput is generally a stronger predictor of delivery than velocity when the question is about completed work over time. Velocity is useful for local team planning when estimates are stable — but delivery forecasting should be based on observed flow.

Velocity tells you how much estimated work a stable team may take on next sprint
Throughput tells you how many items the delivery system actually finishes over time
Cycle time tells you how long individual work items take from start to finish
Monte Carlo forecasting turns that history into confidence-based delivery date ranges
Delivery-system metrics check whether completed work actually reaches production

A diagram showing throughput, cycle time, and cumulative flow over time representing delivery system health

What each option actually delivers

Option A

Velocity-led forecasting

Velocity gives teams a shared planning signal. It connects estimation, sprint planning, and backlog size — and can force useful conversations about complexity, risk, and missing information before work starts.

Best question answered: "How much estimated work might this team take on in the next sprint?"

Where it works

Local capacity planning when team composition and estimate style are stable
Prompting conversations about complexity, risk, and missing acceptance criteria
Supporting rough duration estimates over a known, consistently estimated backlog

Where it breaks down

Estimate-based — a completed 8-point story is subjective, not an objective delivery unit
Team-local — velocity cannot be meaningfully compared across teams
Gameable — when velocity becomes a target, point inflation follows reliably
Blind to wait time, queues, dependencies, and production delivery flow

Velocity works when the team is stable, the backlog is consistently estimated, and the Definition of Done is steady. Move away from any of those conditions and it weakens quickly.

Option B

Throughput-led forecasting

Throughput counts finished work items over time — no estimates required. It shifts the question from 'how many points did we complete?' to 'how many items actually reached Done?'

Best question answered: "How many items does this delivery system typically finish over time?"

Where it works

Removes one abstraction layer — based on observed completions, not estimated size
Useful for release forecasting when backlog items are reasonably comparable in scope
Scales better across teams because it does not depend on shared estimate calibration
Natural base data for Monte Carlo simulations

Where it breaks down

Misleading when item sizes vary wildly — a one-day fix and a six-week migration both count as one
Ignores work mix — defects, features, risk items, and debt all look the same in raw count
Requires consistent workflow boundary definitions to produce meaningful numbers

Throughput is usually the stronger delivery predictor because it measures completed work rather than completed estimates. It needs work-item discipline and enough comparable historical data to be trustworthy.

Option C

Cycle-time and SLE forecasting

Cycle time measures elapsed time from when an item starts to when it finishes. A Service Level Expectation turns that history into a probabilistic statement: '85% of comparable items finish within eight days of being started.'

Best question answered: "When is this specific item likely to be done?"

Where it works

Directly answers 'when will my item be done?' using historical evidence rather than guesses
Exposes hidden wait time — blocked items, review queues, deployment delays, and handoffs
Turns stakeholder expectations from optimistic dates into probability-based statements
Complements throughput: high throughput with poor cycle time reveals WIP or flow problems

Where it breaks down

Depends on clearly and consistently applied workflow state boundaries
Does not communicate total team capacity — only elapsed time per individual item
Becomes noisy if 'started' and 'done' state definitions change frequently

Cycle time is the right metric when the delivery question is about elapsed time for individual items. It is especially powerful for support work, operational delivery, and stakeholder expectation management.

Option D

Monte Carlo forecasting

Monte Carlo forecasting uses historical delivery data to simulate many possible futures — producing a probability distribution rather than a single date. It can answer 'when will we finish this backlog?' or 'how much can we deliver by this date?'

Best question answered: "What is the realistic range of completion dates, and how confident should we be?"

Where it works

Turns throughput variation into honest confidence levels: P50, P85, P95 delivery scenarios
Changes the stakeholder conversation from a single date assertion to a probability range
Enables active trade-offs: reduce scope, reduce WIP, remove blockers, or shift the date
Avoids false precision — explicitly models delivery variation rather than hiding it

Where it breaks down

Requires enough relevant historical throughput or cycle-time data to be meaningful
Produces false confidence if the delivery system changed significantly since data was collected
Does not explain why the forecast is what it is — needs qualitative context alongside numbers

Monte Carlo is the strongest single forecasting tool when historical data is stable and relevant. One important edge case: if the team completed a major platform migration or changed its Definition of Done in the last quarter, that historical data is no longer representative — caveat your simulations accordingly. Show P50 for team planning and P85 for external commitments. A date without a confidence level is an assertion, not a forecast.

Option E

Delivery-system forecasting

Velocity and throughput can both look healthy on a dashboard while users still wait too long for value. Delivery-system metrics ask whether completed work is actually reaching production — safely, quickly, and repeatedly.

Best question answered: "Is our delivery system converting completed work into production value?"

Where it works

Reveals whether completed tickets are converting into actual production deployments
DORA metrics: change lead time, deployment frequency, recovery time, and change fail rate
Flow distribution separates investment in features, defects, risk, and technical debt
The right instrument for engineering leaders managing delivery performance across multiple teams or services

Where it breaks down

Requires integrated data across issue tracking, CI/CD pipelines, production, and incident tools
Overkill for a small team focused on planning a single sprint or release
Does not provide item-level planning detail needed for day-to-day backlog management

A team can complete many tickets while suffering from slow reviews, brittle deployments, large release batches, or high failure rates. In my experience, this level of instrumentation starts paying off once a team is running four or more releases per week — below that, the signal is too infrequent to act on. Delivery-system metrics are the final check that velocity, throughput, and cycle time all miss.

Metric comparison at a glance

How each approach performs across the decisions that matter most to delivery teams.

Criteria	Velocity	Throughput	Cycle time	Monte Carlo	Delivery system
Core measurement	Completed estimates	Completed items	Elapsed time per item	Simulated outcome ranges	Production flow and safety
Best question answered	How much can this team take on?	How many items do we finish?	When will this item be done?	What date range is likely?	Are we delivering safely?
Predictive strength	Moderate — team-local only	Strong for release forecasting	Strong for item-level timing	Strong when data is stable	Strong for production health
Main weakness	Subjective estimate basis	Variable item sizes distort	Needs clear workflow edges	Requires relevant history	Needs integrated tooling
Cross-team use	Weak — always team-specific	Better with item size policies	Better with shared workflow states	Better with comparable data sets	Strongest at value-stream level
Best paired with	Definition of Done and capacity	Cycle time and WIP limits	Throughput and WIP visibility	Throughput and scope control	Flow metrics and product outcomes

Core measurement

VelocityCompleted estimates

ThroughputCompleted items

Cycle timeElapsed time per item

Monte CarloSimulated outcome ranges

Delivery systemProduction flow and safety

Best question answered

VelocityHow much can this team take on?

ThroughputHow many items do we finish?

Cycle timeWhen will this item be done?

Monte CarloWhat date range is likely?

Delivery systemAre we delivering safely?

Predictive strength

VelocityModerate — team-local only

ThroughputStrong for release forecasting

Cycle timeStrong for item-level timing

Monte CarloStrong when data is stable

Delivery systemStrong for production health

Main weakness

VelocitySubjective estimate basis

ThroughputVariable item sizes distort

Cycle timeNeeds clear workflow edges

Monte CarloRequires relevant history

Delivery systemNeeds integrated tooling

Cross-team use

VelocityWeak — always team-specific

ThroughputBetter with item size policies

Cycle timeBetter with shared workflow states

Monte CarloBetter with comparable data sets

Delivery systemStrongest at value-stream level

Best paired with

VelocityDefinition of Done and capacity

ThroughputCycle time and WIP limits

Cycle timeThroughput and WIP visibility

Monte CarloThroughput and scope control

Delivery systemFlow metrics and product outcomes

Decision rules for your situation

Match your situation to the right metric. These are rules, not guidelines.

If: you are planning the next sprint for one stable Scrum team

Then:

Use velocity as supporting evidence only — not as a commitment target or team performance measure

If: someone is comparing velocity across different teams

Then:

Stop. Velocity is team-specific and depends entirely on local estimation practice. Cross-team comparison produces noise, not insight

If: velocity has become a management performance target

Then:

Stop. Point inflation is the inevitable response. The metric has stopped measuring delivery

If: stakeholders ask 'when will this feature or release be delivered?'

Then:

Do not answer with velocity alone — use throughput-based or Monte Carlo forecasting

If: your work items are reasonably sliced and historically comparable in scope

Then:

Use throughput as the primary delivery-rate signal for release forecasting

If: work item sizes vary wildly across the backlog

Then:

Split oversized items, classify work types, and inspect cycle-time variation before trusting raw throughput

If: a stakeholder asks 'when will my specific item be done?'

Then:

Use cycle-time percentiles and a Service Level Expectation based on historical data for comparable items

If: a deadline has commercial, contractual, roadmap, or executive consequences

Then:

Use Monte Carlo forecasting. Show P50, P85, and P95 scenarios. Never give a single date without a confidence level

If: throughput is high but cycle time is poor

Then:

Inspect WIP, blocked work, review delays, testing bottlenecks, and deployment queues

If: throughput is high but production change lead time is poor

Then:

Completed tickets are not converting to user value — inspect delivery-system health using DORA-style metrics

If: the team has no reliable historical delivery data

Then:

Start collecting throughput, cycle time, and WIP now. Use current forecasts cautiously and revisit in four to six weeks

If: the organisation needs cross-team predictability or portfolio planning

Then:

Prefer flow metrics and delivery-system metrics over comparing team velocities

The practical conclusion

Velocity and throughput are not rivals in a metric war. They answer different questions for different audiences at different decision horizons.

Use velocity to support team planning. Use throughput and flow metrics to predict delivery. Use DORA and delivery-system metrics to verify that completed work is actually becoming customer value — not just ticket count.

The common mistake is using a capacity metric to answer a delivery predictability question. Once you separate those two concerns, the right signal for each situation becomes straightforward.

The number at the end of an estimation session is the least important thing it produces. The delivery metric at the end of a sprint is the least important thing that sprint produces. What matters is whether the system is moving work to users efficiently, predictably, and safely.

Track delivery metrics that actually predict

Ibis Flow connects your backlog, your team's estimation history, and your delivery flow — so you can forecast with confidence rather than hope.

Start free trial

See estimation features