Delivery forecasting
14 min read

Velocity vs Throughput: What Actually Predicts Delivery?

Teams have been using a capacity metric to answer a prediction question. Here is what to use instead — and when.

Not which metric is better. Which question are you asking?

"When will this be delivered?" sounds like a simple question. Teams have been reaching for velocity to answer it — but velocity was never designed to carry that weight.

Velocity measures completed estimates. Throughput measures completed work. Cycle time measures elapsed time. Monte Carlo turns historical data into probabilistic forecasts. Each answers something different — and the mismatch between the question and the metric is where most delivery conversations go wrong.

The mismatch between question and metric costs teams credibility. This article names the five tools available, shows what each one actually answers, and gives you explicit rules for when to use which.

Why velocity alone is not enough

Velocity is usually calculated by adding up the story-point estimates for completed user stories in an iteration. Agile Alliance describes it this way and notes it can help a team estimate remaining duration — assuming velocity remains approximately stable.

That assumption is doing significant work. Velocity depends on estimates. Estimates depend on local team judgement. Calibration shifts when team composition, work type, quality expectations, or estimation habits change. Agile Alliance also warns that velocity is not a budget, not a forecast, and should not be compared across teams.

Throughput is different: it counts the work items finished per unit of time. Kanban and flow sources treat it as an observed delivery-system metric rather than a derived estimate-based one. The problem is not that velocity is wrong. It is that teams often use a capacity metric to answer a delivery predictability question — and those are not the same thing.

The core claim stated plainly

Throughput is generally a stronger predictor of delivery than velocity when the question is about completed work over time. Velocity is useful for local team planning when estimates are stable — but delivery forecasting should be based on observed flow.

  • Velocity tells you how much estimated work a stable team may take on next sprint
  • Throughput tells you how many items the delivery system actually finishes over time
  • Cycle time tells you how long individual work items take from start to finish
  • Monte Carlo forecasting turns that history into confidence-based delivery date ranges
  • Delivery-system metrics check whether completed work actually reaches production
A diagram showing throughput, cycle time, and cumulative flow over time representing delivery system health

What each option actually delivers

Option A

Velocity-led forecasting

Velocity gives teams a shared planning signal. It connects estimation, sprint planning, and backlog size — and can force useful conversations about complexity, risk, and missing information before work starts.

Best question answered: "How much estimated work might this team take on in the next sprint?"

Where it works
  • Local capacity planning when team composition and estimate style are stable
  • Prompting conversations about complexity, risk, and missing acceptance criteria
  • Supporting rough duration estimates over a known, consistently estimated backlog
Where it breaks down
  • Estimate-based — a completed 8-point story is subjective, not an objective delivery unit
  • Team-local — velocity cannot be meaningfully compared across teams
  • Gameable — when velocity becomes a target, point inflation follows reliably
  • Blind to wait time, queues, dependencies, and production delivery flow

Velocity works when the team is stable, the backlog is consistently estimated, and the Definition of Done is steady. Move away from any of those conditions and it weakens quickly.

Option B

Throughput-led forecasting

Throughput counts finished work items over time — no estimates required. It shifts the question from 'how many points did we complete?' to 'how many items actually reached Done?'

Best question answered: "How many items does this delivery system typically finish over time?"

Where it works
  • Removes one abstraction layer — based on observed completions, not estimated size
  • Useful for release forecasting when backlog items are reasonably comparable in scope
  • Scales better across teams because it does not depend on shared estimate calibration
  • Natural base data for Monte Carlo simulations
Where it breaks down
  • Misleading when item sizes vary wildly — a one-day fix and a six-week migration both count as one
  • Ignores work mix — defects, features, risk items, and debt all look the same in raw count
  • Requires consistent workflow boundary definitions to produce meaningful numbers

Throughput is usually the stronger delivery predictor because it measures completed work rather than completed estimates. It needs work-item discipline and enough comparable historical data to be trustworthy.

Option C

Cycle-time and SLE forecasting

Cycle time measures elapsed time from when an item starts to when it finishes. A Service Level Expectation turns that history into a probabilistic statement: '85% of comparable items finish within eight days of being started.'

Best question answered: "When is this specific item likely to be done?"

Where it works
  • Directly answers 'when will my item be done?' using historical evidence rather than guesses
  • Exposes hidden wait time — blocked items, review queues, deployment delays, and handoffs
  • Turns stakeholder expectations from optimistic dates into probability-based statements
  • Complements throughput: high throughput with poor cycle time reveals WIP or flow problems
Where it breaks down
  • Depends on clearly and consistently applied workflow state boundaries
  • Does not communicate total team capacity — only elapsed time per individual item
  • Becomes noisy if 'started' and 'done' state definitions change frequently

Cycle time is the right metric when the delivery question is about elapsed time for individual items. It is especially powerful for support work, operational delivery, and stakeholder expectation management.

Option D

Monte Carlo forecasting

Monte Carlo forecasting uses historical delivery data to simulate many possible futures — producing a probability distribution rather than a single date. It can answer 'when will we finish this backlog?' or 'how much can we deliver by this date?'

Best question answered: "What is the realistic range of completion dates, and how confident should we be?"

Where it works
  • Turns throughput variation into honest confidence levels: P50, P85, P95 delivery scenarios
  • Changes the stakeholder conversation from a single date assertion to a probability range
  • Enables active trade-offs: reduce scope, reduce WIP, remove blockers, or shift the date
  • Avoids false precision — explicitly models delivery variation rather than hiding it
Where it breaks down
  • Requires enough relevant historical throughput or cycle-time data to be meaningful
  • Produces false confidence if the delivery system changed significantly since data was collected
  • Does not explain why the forecast is what it is — needs qualitative context alongside numbers

Monte Carlo is the strongest single forecasting tool when historical data is stable and relevant. One important edge case: if the team completed a major platform migration or changed its Definition of Done in the last quarter, that historical data is no longer representative — caveat your simulations accordingly. Show P50 for team planning and P85 for external commitments. A date without a confidence level is an assertion, not a forecast.

Option E

Delivery-system forecasting

Velocity and throughput can both look healthy on a dashboard while users still wait too long for value. Delivery-system metrics ask whether completed work is actually reaching production — safely, quickly, and repeatedly.

Best question answered: "Is our delivery system converting completed work into production value?"

Where it works
  • Reveals whether completed tickets are converting into actual production deployments
  • DORA metrics: change lead time, deployment frequency, recovery time, and change fail rate
  • Flow distribution separates investment in features, defects, risk, and technical debt
  • The right instrument for engineering leaders managing delivery performance across multiple teams or services
Where it breaks down
  • Requires integrated data across issue tracking, CI/CD pipelines, production, and incident tools
  • Overkill for a small team focused on planning a single sprint or release
  • Does not provide item-level planning detail needed for day-to-day backlog management

A team can complete many tickets while suffering from slow reviews, brittle deployments, large release batches, or high failure rates. In my experience, this level of instrumentation starts paying off once a team is running four or more releases per week — below that, the signal is too infrequent to act on. Delivery-system metrics are the final check that velocity, throughput, and cycle time all miss.

Metric comparison at a glance

How each approach performs across the decisions that matter most to delivery teams.

CriteriaVelocityThroughputCycle timeMonte CarloDelivery system
Core measurementCompleted estimatesCompleted itemsElapsed time per itemSimulated outcome rangesProduction flow and safety
Best question answeredHow much can this team take on?How many items do we finish?When will this item be done?What date range is likely?Are we delivering safely?
Predictive strengthModerate — team-local onlyStrong for release forecastingStrong for item-level timingStrong when data is stableStrong for production health
Main weaknessSubjective estimate basisVariable item sizes distortNeeds clear workflow edgesRequires relevant historyNeeds integrated tooling
Cross-team useWeak — always team-specificBetter with item size policiesBetter with shared workflow statesBetter with comparable data setsStrongest at value-stream level
Best paired withDefinition of Done and capacityCycle time and WIP limitsThroughput and WIP visibilityThroughput and scope controlFlow metrics and product outcomes

Core measurement

VelocityCompleted estimates
ThroughputCompleted items
Cycle timeElapsed time per item
Monte CarloSimulated outcome ranges
Delivery systemProduction flow and safety

Best question answered

VelocityHow much can this team take on?
ThroughputHow many items do we finish?
Cycle timeWhen will this item be done?
Monte CarloWhat date range is likely?
Delivery systemAre we delivering safely?

Predictive strength

VelocityModerate — team-local only
ThroughputStrong for release forecasting
Cycle timeStrong for item-level timing
Monte CarloStrong when data is stable
Delivery systemStrong for production health

Main weakness

VelocitySubjective estimate basis
ThroughputVariable item sizes distort
Cycle timeNeeds clear workflow edges
Monte CarloRequires relevant history
Delivery systemNeeds integrated tooling

Cross-team use

VelocityWeak — always team-specific
ThroughputBetter with item size policies
Cycle timeBetter with shared workflow states
Monte CarloBetter with comparable data sets
Delivery systemStrongest at value-stream level

Best paired with

VelocityDefinition of Done and capacity
ThroughputCycle time and WIP limits
Cycle timeThroughput and WIP visibility
Monte CarloThroughput and scope control
Delivery systemFlow metrics and product outcomes

Decision rules for your situation

Match your situation to the right metric. These are rules, not guidelines.

1

If: you are planning the next sprint for one stable Scrum team

Then:

Use velocity as supporting evidence only — not as a commitment target or team performance measure

2

If: someone is comparing velocity across different teams

Then:

Stop. Velocity is team-specific and depends entirely on local estimation practice. Cross-team comparison produces noise, not insight

3

If: velocity has become a management performance target

Then:

Stop. Point inflation is the inevitable response. The metric has stopped measuring delivery

4

If: stakeholders ask 'when will this feature or release be delivered?'

Then:

Do not answer with velocity alone — use throughput-based or Monte Carlo forecasting

5

If: your work items are reasonably sliced and historically comparable in scope

Then:

Use throughput as the primary delivery-rate signal for release forecasting

6

If: work item sizes vary wildly across the backlog

Then:

Split oversized items, classify work types, and inspect cycle-time variation before trusting raw throughput

7

If: a stakeholder asks 'when will my specific item be done?'

Then:

Use cycle-time percentiles and a Service Level Expectation based on historical data for comparable items

8

If: a deadline has commercial, contractual, roadmap, or executive consequences

Then:

Use Monte Carlo forecasting. Show P50, P85, and P95 scenarios. Never give a single date without a confidence level

9

If: throughput is high but cycle time is poor

Then:

Inspect WIP, blocked work, review delays, testing bottlenecks, and deployment queues

10

If: throughput is high but production change lead time is poor

Then:

Completed tickets are not converting to user value — inspect delivery-system health using DORA-style metrics

11

If: the team has no reliable historical delivery data

Then:

Start collecting throughput, cycle time, and WIP now. Use current forecasts cautiously and revisit in four to six weeks

12

If: the organisation needs cross-team predictability or portfolio planning

Then:

Prefer flow metrics and delivery-system metrics over comparing team velocities

The practical conclusion

Velocity and throughput are not rivals in a metric war. They answer different questions for different audiences at different decision horizons.

Use velocity to support team planning. Use throughput and flow metrics to predict delivery. Use DORA and delivery-system metrics to verify that completed work is actually becoming customer value — not just ticket count.

The common mistake is using a capacity metric to answer a delivery predictability question. Once you separate those two concerns, the right signal for each situation becomes straightforward.

The number at the end of an estimation session is the least important thing it produces. The delivery metric at the end of a sprint is the least important thing that sprint produces. What matters is whether the system is moving work to users efficiently, predictably, and safely.

Track delivery metrics that actually predict

Ibis Flow connects your backlog, your team's estimation history, and your delivery flow — so you can forecast with confidence rather than hope.