Inputs
Example data table
| Scenario | Strategy | Trials | Units | Unit-hours | Wall hours | Total cost |
|---|---|---|---|---|---|---|
| Small sweep | Grid | 18 | 1 | 25.93 | 8.64 | $241.22 |
| Medium random | Random | 60 | 1 | 169.77 | 28.29 | $834.68 |
| Large Bayesian | Bayesian | 140 | 2 | 971.76 | 40.49 | $2,939.27 |
Formula used
RawHours = EffectiveRuntime + WarmupMinutes/60.
BilledHoursPerTrial = RawHours × (1 + Overhead%).
ComputeGross = UnitHours × HourlyCostPerUnit.
ComputeNet = ComputeGross × (1 − Discount%).
PreTax = Subtotal + Subtotal×Contingency%.
Total = PreTax + PreTax×Tax%.
How to use this calculator
- Enter planned trials, average runtime, and units per trial.
- Add warmup and overhead to reflect real billed time.
- Set retry rate and discounts based on your environment.
- Fill per-trial costs for evaluation, logging, and data prep.
- Add storage, egress, fees, and engineering time.
- Click Calculate to view breakdown, then export CSV or PDF.
Key cost drivers in tuning experiments
Compute usually dominates tuning budgets. Total compute spend scales with trials, average runtime, units per trial, and the hourly price per unit. For example, 120 trials × 0.75 hours × 1 unit at $3.20/hour is $288 before discounts. With a 15% discount, the block drops to $244.80. Add storage, egress, platform fees, and labor to avoid overruns.
Sizing trial runtime and parallelism
Start with a realistic mean runtime, not the fastest run. If your median is 40 minutes but the 90th percentile is 70 minutes, plan near 0.9 hours. Concurrency changes wall-clock time, not unit-hours. Running 12 trials at once finishes sooner, but it still bills the same unit-hours unless pricing differs by capacity tier. If you mix unit types, estimate a weighted hourly rate, e.g., 60% standard and 40% discounted.
Accounting for orchestration and retries
Warmup, queueing, and orchestration overhead can add 5–20% to billed time. Retries matter in noisy environments; a 12% retry rate turns 200 planned trials into 224 expected trials. Early stopping can offset this. If pruning saves 18% of runtime, apply it to the average runtime before overhead so the discount is not double-counted.
Evaluation, logging, and data handling costs
Per-trial evaluation often includes inference, metrics aggregation, and artifact uploads. Even $0.35 per trial becomes $78 at 224 trials. Logging and experiment tracking can be priced per run, per GB, or per API call. Treat data preparation as a per-trial cost when it repeats, and as labor when it is a one-time pipeline build. Storage is commonly billed per GB-month; retaining 120 GB for 30 days at $0.10/GB-month adds $12, plus transfer.
Budgeting with contingencies and governance
Use contingency for variance in runtime, retries, and scope. A 10% contingency on a $4,500 subtotal is $450, which is often cheaper than a mid-cycle stop. Apply tax only after contingency if tax is charged on the full invoice. Keep a record of assumptions so stakeholders can compare planned versus actual spend and adjust the search space intelligently. Operational guardrails help: set max trials, spend alerts, and a “stop if no improvement after N trials” rule to protect the budget each cycle.
FAQs
Use the billable resource your provider prices hourly, such as a GPU, accelerator slice, or CPU node. If one trial uses two GPUs, set Units per Trial to 2.
Concurrency mainly changes wall-clock time. Total compute cost depends on unit-hours, not how many trials run in parallel. Spend changes only if parallel runs force a pricier tier or increase retries.
Use a realistic average that reflects long tails. A mean or trimmed-mean often works better than the fastest run. If you have p90 data, set the average closer to p75–p90 for safety.
Early stopping reduces the effective runtime per trial. Enter the expected percentage saved from pruning or stopping rules, then the calculator applies it before overhead so billed time is not reduced twice.
Initialization, data loading, checkpointing, and orchestration time are frequently billed. Adding warmup minutes and overhead percent makes the estimate match invoices more closely, especially for short trials where setup is a larger share.
Reduce the search space and trial count with better priors, smaller pilot runs, and adaptive search. Improve data and evaluation so fewer retries occur. Use budget caps and stopping rules to end unproductive runs early.