Enter pipeline planning inputs
Example data table
| Scenario | Records/hour | Size | Workers | Buffered peak | Current capacity | Outcome |
|---|---|---|---|---|---|---|
| Streaming baseline | 180,000 | 6 KB | 16 | 111.24 records/sec | 174.60 records/sec | Healthy headroom for bursts and growth. |
| Peak marketing run | 240,000 | 8 KB | 16 | 197.76 records/sec | 174.60 records/sec | Scale workers or reduce latency. |
| Scaled worker pool | 240,000 | 8 KB | 20 | 197.76 records/sec | 218.25 records/sec | Buffered peak is covered safely. |
Formula used
Normal required rate = (records per hour / 3600) × (1 + retry rate)
Peak required rate = normal required rate × peak traffic multiplier
Buffered peak rate = peak required rate × (1 + growth buffer)
Latency-limited worker rate = 1000 / average stage latency in milliseconds
Effective worker rate = smaller of configured worker throughput and latency-limited worker rate
Current capacity = effective worker rate × workers × parallel stage slots × utilization × availability / stages
Recommended workers = ceiling of buffered peak rate ÷ per-worker effective contribution
Buffered daily storage = records per hour × 24 × size × retry factor × growth factor
Burst backlog = positive gap between buffered peak and capacity × burst seconds
This model estimates sustainable throughput, not exact hardware limits. Replace assumed inputs with benchmarked values from production or realistic load tests.
How to use this calculator
- Enter your current hourly record arrival rate.
- Add average record size, stages, and stage latency.
- Enter worker throughput, worker count, and parallel slots.
- Set utilization, retry rate, availability, and growth buffer.
- Choose retention and burst duration for backlog planning.
- Press Calculate capacity to see results above the form.
- Review the chart, worker recommendation, storage, and backlog risk.
- Export the results in CSV or PDF for planning notes.
Frequently asked questions
1. What does this calculator estimate?
It estimates throughput, worker demand, bandwidth, storage footprint, peak coverage, and burst backlog risk for a multi-stage data pipeline.
2. Why is average stage latency important?
Latency limits how many records one worker can finish each second. Even with many workers, slow stages can cap total capacity.
3. Why include retry rate?
Retries increase total work. A small retry percentage can materially raise throughput demand, storage usage, and backlog risk during spikes.
4. What is the growth buffer for?
Growth buffer reserves extra capacity beyond current forecasts. It helps teams plan for new customers, higher event volume, or delayed scaling decisions.
5. Does this work for batch pipelines too?
Yes. Treat batch arrivals as an equivalent hourly rate, then use burst duration to estimate queue buildup and recovery time.
6. What does “parallel stage slots” mean?
It represents how many records can be processed at once across your stages. More slots usually raise throughput when no other bottleneck blocks progress.
7. Why might backlog clear time be unavailable?
If normal incoming work already matches or exceeds current capacity, the queue cannot drain without extra workers, lower latency, or lower load.
8. Should I trust the result for procurement decisions?
Use it as a planning baseline. Final decisions should also include benchmark tests, failure scenarios, memory limits, and downstream service constraints.