Bandit Regret Calculator

Calculator

Mode

Choose the input style that matches your data.

Confidence level

Used only for the indicative UCB helper.

Rounding (decimals)

Controls displayed precision and exports.

Arm means (mu_i)

Comma/space separated. Units match your reward scale.

Pull counts (n_i)

Must match means length. Sum equals T.

Reference

This is pseudo-regret using expected rewards.

Reset

Example data table

Scenario	Inputs	What you compute
Expected regret	mu = [0.60, 0.55, 0.48], n = [50, 35, 15]	Pseudo-regret using means and pull counts.
Empirical regret	mu* = 0.65, r = [0.40, 0.55, 0.70, 0.60, 0.50]	Regret from observed rewards over rounds.
Diagnostics	confidence = 0.95	Indicative UCB per arm to compare uncertainty.

Formula used

Expected (pseudo) regret

R_T = T · μ* − Σᵢ (nᵢ · μᵢ)

Here, μ* is the best arm mean, μᵢ is arm i mean, nᵢ is pulls of arm i, and T = Σᵢ nᵢ.

Empirical regret

R_T = Σₜ (μ* − rₜ)

rₜ is the observed reward at round t. If μ* is unknown, you can approximate it using the best estimated mean.

Indicative UCB helper

UCBᵢ ≈ μᵢ + √( ln(2/δ) / (2 nᵢ) )

δ = 1 − confidence. This is a display aid for uncertainty; it is not a policy.

How to use this calculator

Select a mode: expected or empirical.
Enter means and counts, or rewards and μ*.
Set rounding and confidence to your preference.
Press Submit to view results above the form.
Download CSV or PDF using the result buttons.

FAQs

1) What is regret in a bandit problem?

Regret measures how much reward you missed versus always choosing the best arm. Lower regret means better exploration–exploitation performance over time.

2) What is pseudo-regret versus empirical regret?

Pseudo-regret uses expected arm means and pull counts. Empirical regret uses observed rewards per round with a chosen μ*. Both are common in analysis.

3) Do rewards need to be between 0 and 1?

No for regret math. However, the UCB helper assumes bounded rewards in [0,1]. If your scale differs, treat UCB as qualitative guidance only.

4) How do I pick μ* if I do not know it?

Use the best known estimate, such as the maximum empirical mean from your logs. This gives an approximate regret that is useful for comparisons.

5) Can regret be negative?

Yes, with empirical regret if observed rewards exceed μ* due to randomness or a poor μ* estimate. Over long horizons, regret is typically nonnegative in expectation.

6) What does average regret mean?

Average regret is cumulative regret divided by rounds T. It tells you the typical loss per round versus the best arm, and often trends down for good strategies.

7) What should I export for reporting?

Export cumulative regret, average regret, T, and your inputs summary. If comparing runs, keep consistent μ* assumptions and report the mode used.

Calculator

Example data table

Formula used

How to use this calculator

FAQs

Related Calculators