Epsilon Greedy Policy Probability Calculator

Calculator Form

Total Actions

Greedy Action Count

Epsilon Value

Epsilon Format

Exploration Mode

Target Probability

Number of Trials

Average Greedy Reward

Average Non-Greedy Reward

Scenario Notes

Formula Used

Common all-action exploration:

Single greedy action probability = ((1 - ε) / k) + (ε / n)

Single non-greedy action probability = ε / n

Non-greedy-only exploration:

Single greedy action probability = (1 - ε) / k

Single non-greedy action probability = ε / (n - k)

Expected selections:

Expected count = probability × trials

Expected reward:

Expected reward = P(any greedy) × greedy reward + P(any non-greedy) × non-greedy reward

Here, ε is epsilon, n is total actions, and k is the number of tied greedy actions.

How to Use This Calculator

Enter the total number of actions available to the policy.
Enter how many actions currently tie as greedy choices.
Enter epsilon as a decimal or percent.
Select whether exploration samples all actions or only non-greedy actions.
Choose the target probability you want to inspect.
Add trial count and reward values for expected estimates.
Press the calculate button to view results above the form.
Use the export buttons to save the result table.

Example Data Table

Actions	Greedy	Epsilon	Mode	Single Greedy	Single Non-Greedy
4	1	0.1	All actions	92.5%	2.5%
5	2	0.2	All actions	44%	4%
8	1	0.3	Non-greedy only	70%	4.2857%
3	3	0.15	All actions	33.3333%	0%

Understanding Epsilon Greedy Probability

An epsilon greedy policy is a simple exploration rule. It is used in reinforcement learning, bandit testing, and online experiments. The policy chooses between exploitation and exploration. Exploitation means choosing an action currently believed to be best. Exploration means trying another action so the system can learn more.

The main input is epsilon. A low epsilon makes the policy conservative. It trusts the current best action most of the time. A high epsilon makes the policy more curious. It spreads more probability across available actions. This calculator converts those ideas into clear probabilities.

How The Policy Shares Probability

In the common version, exploration is uniform across every action. If there are n actions, each action receives epsilon divided by n during exploration. Greedy actions also receive the exploitation share. When several actions tie for best value, the exploitation share is split across those greedy actions.

For one greedy action, the probability is one minus epsilon, divided by greedy choices, plus epsilon divided by all actions. For one non-greedy action, the probability is only epsilon divided by all actions. The calculator also supports exploration across non-greedy actions only. That option is useful when your code never explores the current best action.

Why Expected Counts Matter

Probability is helpful, but expected counts are often easier to plan with. If you run 10,000 trials, a probability of 0.12 means about 1,200 selections. Actual results may vary because the policy is random. Still, expected counts help compare settings before testing them in code.

Use Cases In Statistics

Epsilon greedy rules appear in A/B/n testing, recommendation systems, adaptive surveys, and simulation studies. They help balance learning and performance. A very small epsilon may miss better options. A very large epsilon may waste many trials on weak options.

Good settings depend on risk, sample size, and reward noise. Start with a simple value. Review expected counts. Then adjust epsilon until every important action receives enough samples. This approach makes policy design easier to explain, audit, and reproduce. The same table can support classroom examples, model documentation, and quick checks during debugging. It also helps teams discuss fairness, regret, and sampling pressure before deployment in real projects today.

FAQs

What is epsilon in an epsilon greedy policy?

Epsilon is the exploration rate. It controls how often the policy tries actions instead of only choosing the current best action.

What does a greedy action mean?

A greedy action is an action with the highest current estimated value. There can be one greedy action or several tied greedy actions.

Why does a greedy action also get exploration probability?

In the common version, random exploration samples all actions. That includes greedy actions, so each greedy action receives exploration and exploitation probability.

When should I use non-greedy-only exploration?

Use it when your implementation excludes current best actions during random exploration. This is less common, but it appears in some custom learning systems.

Can epsilon be entered as a percent?

Yes. Select the percent option and enter values like 10 for ten percent. Select decimal for values like 0.10.

What happens when several actions tie for best value?

The exploitation share is divided evenly among tied greedy actions. The calculator uses the greedy action count for that split.

Are expected counts exact results?

No. Expected counts are long-run averages. Actual selections can differ because the policy makes random exploration choices.

How is expected reward estimated?

The calculator multiplies greedy probability by average greedy reward. It also multiplies non-greedy probability by average non-greedy reward, then adds them.