Plan partitions before traffic surges break latency budgets. Balance producer speed and consumer parallelism quickly. See storage, replicas, and safe limits at a glance.
| Item | Example value |
|---|---|
| Topic | orders.events |
| Ingress (msgs/s) | 50,000 |
| Avg message size | 1 KB |
| Peak factor | 1.5× |
| Brokers | 6 |
| Replication | 3 |
| Retention | 7 days |
| Suggested partitions | 24 |
1) Throughput-driven partitions
Effective throughput = Base throughput × Peak factor. Partitions needed (write) = ceil( Effective MB/s ÷ (Write MB/s per partition × Utilization) ). Partitions needed (read) = ceil( Effective MB/s ÷ (Read MB/s per partition × Utilization) ).
2) Parallelism-driven partitions
To use N consumers in one group concurrently, plan at least N partitions. Producer threads can also benefit from at least that many partitions.
3) Storage sizing
On-disk MB/s ≈ Effective MB/s × Compression factor × Overhead factor. Daily GB ≈ On-disk MB/s × 86,400 ÷ 1,024. Cluster GB ≈ Daily GB × Retention days × Replication factor. Per-broker raw GB ≈ (Cluster GB ÷ Brokers) ÷ Disk utilization target.
Partition sizing starts with a steady ingress estimate, then applies a peak factor to cover burst traffic. For example, 50,000 messages per second at 1 KB is about 47.7 MB/s. With a 1.5× peak factor, the calculator models 71.5 MB/s so your topic stays stable during deployments, retries, or sudden demand spikes.
Each partition has a sustainable write and read envelope that depends on broker CPU, disk, and network. The calculator converts your per-partition limits into effective capacities using a utilization target, such as 70%. If you assume 5 MB/s write capacity, the effective budget is 3.5 MB/s, and partitions required become the ceiling of effective throughput divided by that budget.
Throughput alone is not enough if you need parallel processing. A consumer group cannot process faster than its active partitions, so partitions should be at least the number of consumers to avoid idle instances. If you provide a per-consumer processing capacity, the tool also estimates the consumer count needed to keep up with peak throughput, reducing the risk of lag growth during backfills.
Rounding to a multiple of brokers helps spread leaders evenly and reduces hotspots. With six brokers and 24 partitions, the topic averages four partitions per broker before considering replicas. The replication factor multiplies leader and follower traffic, so balanced placement improves recovery behavior after broker loss and minimizes ISR churn when disks approach saturation.
Storage is modeled from effective throughput, then adjusted by compression and log overhead. Suppose effective throughput is 71.5 MB/s, compression factor is 0.5, and overhead is 1.10. The stored rate becomes 39.3 MB/s, which is about 3.32 TB per day. With seven days retention and replication factor three, cluster storage reaches roughly 69.7 TB, then the disk utilization target inflates required raw capacity.
Use warnings as guardrails, not absolutes. If per-partition MB/s exceeds your configured envelopes, raise partitions or validate hardware capacity. If partitions per broker exceed your soft limit, add brokers or consolidate topics. After launch, verify producer batch size, compression type, segment bytes, and consumer max.poll settings, then iterate using MB/s and latency.
Start with the suggested count from throughput and parallelism, then round for broker distribution. Prefer modest growth steps, because increasing partitions later can change key ordering and can require careful consumer coordination.
Not always. Partitions should be at least the number of active consumers for full parallelism, but extra partitions can increase overhead. Use more partitions when you expect growth, multiple consumer groups, or uneven processing times.
It derates your per-partition capacity to keep headroom. A 70% target assumes you only use 70% of estimated throughput limits, which reduces latency spikes during bursts, rebalances, log compaction, and background broker work.
Replication multiplies storage and network writes because followers copy leader data. It does not increase consumer parallelism, but it does increase disk and inter-broker traffic, so balanced partition placement across brokers becomes more important.
Compression estimates how much data shrinks on disk, while overhead covers indexes, segment metadata, and internal bookkeeping. Together they make retention sizing more realistic than raw MB/s, especially for small messages and many partitions.
Confirm per-partition MB/s, end-to-end latency, consumer lag, and broker disk and network utilization. If hotspots appear, adjust partition count, producer batching, consumer concurrency, and retention settings using real telemetry trends.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.