Contextual Bandit Calculator

Model contextual choices with interpretable reward and uncertainty inputs. Test LinUCB, epsilon-greedy, and softmax policies. Pick smarter actions using data, confidence, and exploration balance.

Calculator Inputs

Configure context, policy, and arm parameters

Use the responsive form below. It displays three columns on large screens, two on smaller screens, and one on mobile devices.

Policy settings

Context features

These values represent the current request, user, environment, or session features.

Quick guidance

  • Use LinUCB when uncertainty should drive exploration.
  • Use epsilon-greedy for simple controlled exploration.
  • Use softmax for smoother probability-based allocation.
  • Actual rewards are optional but useful for regret.
  • Weight signs can be positive or negative.

Arm A parameters

Arm B parameters

Arm C parameters

Example Data

Sample contextual bandit dataset

This example mirrors the default values in the calculator so you can test the workflow quickly.

Arm Bias Uncertainty Actual Reward Weight 1 Weight 2 Weight 3 Weight 4
Recommendation Model A 0.12 0.18 0.74 0.42 0.28 0.15 0.10
Recommendation Model B 0.09 0.12 0.68 0.34 0.36 0.18 0.07
Recommendation Model C 0.15 0.16 0.79 0.30 0.22 0.31 0.16

Default context: Feature 1 = 0.80, Feature 2 = 0.50, Feature 3 = 0.30, Feature 4 = 0.20.

Formula Used

Core equations behind the calculator

Estimated reward:
r̂(a) = bias(a) + Σ [weight(a,i) × context(i)]
LinUCB score:
score(a) = r̂(a) + α × uncertainty(a)
Epsilon-greedy share:
best arm share = 1 − ε + ε / K, other arm share = ε / K
Softmax probability:
P(a) = exp(r̂(a) / τ) ÷ Σ exp(r̂(j) / τ)
Expected cumulative reward:
total reward = selected estimated reward × planned decisions
Actual regret:
regret = oracle actual reward − selected actual reward
How To Use

Steps for running the calculator correctly

  1. Choose the policy you want to evaluate: LinUCB, epsilon-greedy, or softmax.
  2. Enter exploration settings such as alpha, epsilon, temperature, and the number of planned decisions.
  3. Add the live context values that describe the current user, session, or environment.
  4. For each arm, supply a label, bias, uncertainty estimate, and four feature weights.
  5. Optionally enter actual rewards for all arms if you want the calculator to compute regret.
  6. Press submit to view the result summary above the form, then export the output as CSV or PDF.
FAQs

Common questions about contextual bandit modeling

1. What does this calculator estimate?

It estimates contextual reward scores for several actions, compares exploration policies, and highlights the arm most likely to perform best under the chosen decision rule.

2. When should I use LinUCB?

Use LinUCB when you want uncertainty-aware exploration. It adds a confidence bonus, so arms with less data can still receive traffic when their upside remains plausible.

3. What does epsilon control?

Epsilon controls random exploration. Higher epsilon sends more traffic away from the current best estimated arm and spreads opportunities across competing actions.

4. What does temperature mean in softmax?

Temperature controls how sharply probabilities react to reward differences. Lower values concentrate traffic on the best arm, while higher values distribute traffic more evenly.

5. Why are actual rewards optional?

Actual rewards are useful for regret analysis, but not required for score estimation. You can still compare predicted performance using model weights, context, and uncertainty.

6. Can I use negative feature weights?

Yes. Negative weights simply mean a feature lowers the estimated reward for that arm. This is common when a context signal predicts weaker response.

7. What does expected regret show?

Expected regret measures the estimated reward lost by not picking the best estimated arm every time across the planned decision horizon.

8. How should I choose the planned decisions value?

Set it to the number of impressions, sessions, or allocation rounds you expect. This lets the calculator scale single-step estimates into campaign-level expectations.

Related Calculators

cosine similaritypairwise rankingndcg scorenovelty scoreals factorizationchurn reductionbandit regretserendipity scoreexploration rateuser similarity

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.