| Scenario | Size | Clock | Bus | Eff. | Burst | Arb+IRQ | Result |
|---|---|---|---|---|---|---|---|
| Audio buffer | 32 KB | 100 MHz | 32-bit | 90% | 128 B | 2.0 µs | High throughput, low latency |
| Sensor burst | 4 KB | 80 MHz | 16-bit | 75% | 64 B | 4.5 µs | Overheads dominate small payloads |
| Frame chunk | 256 KB | 200 MHz | 64-bit | 85% | 512 B | 3.0 µs | Good balance for shared buses |
Use scenarios to compare burst sizing and latency budgets quickly.
- BusThroughput = (ClockHz × BusWidthBits) ÷ 8
- TheoreticalTime = Bytes ÷ BusThroughput
- DataTime = TheoreticalTime ÷ (Efficiency/100)
- BurstCount = ceil(Bytes ÷ BurstSizeBytes)
- BurstOverheadTime = BurstCount × BurstOverheadUs
- FixedLatencyTime = ArbitrationUs + InterruptUs
- TimePerTransfer = DataTime + BurstOverheadTime + FixedLatencyTime
- EffectiveThroughput = Bytes ÷ TimePerTransfer
- Enter payload size and select its unit.
- Set clock and bus width to match your interface.
- Choose an efficiency estimate from measurements or datasheets.
- Provide burst size and overhead to reflect protocol behavior.
- Add arbitration and interrupt latency for your software path.
- Press Calculate and review throughput and timing.
- Export CSV for reports, or PDF for sharing.
Why transfer time is rarely just bytes divided by bandwidth
DMA performance is shaped by more than payload size. Real buses include address phases, handshakes, and wait states when targets cannot accept data. This calculator separates theoretical data time from practical losses using an efficiency factor. When efficiency drops, the same buffer consumes more time and increases deadline risk in audio, motor control, and data acquisition loops.
How bus width and clock translate into a ceiling
Bus throughput is computed from clock frequency and bus width, producing a clear upper limit in bytes per second. A 32-bit path at 200 MHz yields a higher ceiling than a 16-bit path at 80 MHz, but only if the fabric can sustain one beat per clock. Use the theoretical figure to sanity-check datasheet claims and confirm configuration registers.
Burst sizing: balancing protocol overhead and fairness
Bursts reduce repeated setup, but very large bursts can increase latency for other masters and raise arbitration delays on shared fabrics. The calculator estimates burst count with a per-burst overhead term, showing why small transfers can be overhead-dominated. Try sweeping burst size to find a stable point where overhead shrinks without creating long bus occupancy.
Latency terms that matter in real firmware
Arbitration latency captures the time before the DMA engine gains access to the bus, which grows under contention from CPUs, GPUs, or other DMAs. Interrupt latency models completion handling, including ISR entry and bookkeeping. For tight loops, these fixed costs can outweigh data movement. Enter measured values from trace tools to match field behavior.
Using outputs for design decisions and verification
Time per transfer and effective throughput help size ring buffers, choose interrupt coalescing, and decide whether to chain descriptors. Estimated utilization indicates whether the bus is a limiting resource or whether software overhead dominates. Export CSV for requirements documents and PDF for reviews, then compare scenarios to justify burst, clock, or contention mitigation choices. For validation, compare computed time against logic-analyzer captures and DMA completion timestamps, then iterate efficiency and latency inputs until the model tracks reality across multiple buffer sizes and transfer directions. This loop builds confidence before release.
What does the efficiency percentage represent?
It approximates protocol gaps, wait-states, and contention that reduce sustained throughput versus the ideal one-beat-per-clock rate. Use measured bandwidth or bus utilization counters to estimate it.
How should I choose burst size?
Start with a value aligned to cache lines or FIFO depth, then sweep larger and smaller bursts. Pick the smallest size that delivers stable throughput without creating long bus occupancy or starving other masters.
Why do small transfers look slow?
Fixed costs dominate: burst setup, arbitration, and interrupt handling can be larger than the data movement time. Consider batching, descriptor chaining, or interrupt coalescing for many small buffers.
Is the bus throughput formula always accurate?
It is a ceiling based on width and clock. Some interconnects transfer multiple beats per cycle, or throttle by target readiness. Capture this by adjusting efficiency and the per-burst overhead terms.
What should I enter for arbitration and interrupt latency?
Prefer measurements: time from DMA request to first beat for arbitration, and completion signal to ISR completion for interrupt latency. If unknown, start with microsecond-scale estimates and refine with traces.
Can I model memory-to-memory copies?
Yes. Select memory-to-memory direction to document intent, then tune efficiency and overhead for your fabric. Compare outputs against memcpy or DMA benchmarks to decide which approach meets your timing budget.