Relative Entropy Calculator

Calculator Inputs

Enter P and Q as lists. Nonnegative values only.

Responsive form: 3/2/1 columns

Distribution P

Use the same length as Q.

Distribution Q

Zeros may cause infinite divergence.

Input delimiter

Log base

Epsilon smoothing (optional)

Adds ε to each entry before normalization.

Normalize P and Q to sum to 1

Clip Q zeros to tiny value

Also compute Jensen–Shannon divergence

Reset

Example Data Table

Sample probabilities and the per-term contributions to relative entropy.

Index	Pᵢ	Qᵢ	Pᵢ·ln(Pᵢ/Qᵢ)
1	0.40	0.30	0.1151
2	0.35	0.40	-0.0467
3	0.25	0.30	-0.0456

Example KL ≈ 0.0228 nats (natural log).

Formula Used

Relative entropy between distributions P and Q is:

D_KL(P‖Q) = Σ p_i · log( p_i / q_i )

log base e gives nats; base 2 gives bits.
If p_i = 0, that term contributes 0.
If q_i = 0 while p_i > 0, divergence is infinite.

How to Use This Calculator

Paste your P list and Q list using the same length.
Choose your delimiter and preferred log base.
Enable normalization if inputs are counts or weights.
Optional: add epsilon smoothing to reduce zero issues.
Click Calculate to view the full per-term breakdown.
Use Download CSV or Download PDF for sharing.

Why Relative Entropy Matters

Relative entropy, also called KL divergence, quantifies how one probability model differs from another. In practice, it measures extra coding cost when Q is used to represent events generated by P. A value of 0 means identical distributions; larger values indicate stronger mismatch. It is directional, so D(P‖Q) generally differs from D(Q‖P).

Input Quality and Normalization

This calculator accepts probabilities or raw weights. When weights are supplied, enabling normalization converts each list into a valid distribution that sums to 1. For example, counts 40, 35, 25 become 0.40, 0.35, 0.25. If totals are very small or inconsistent, normalization prevents scale from distorting the comparison.

Choosing the Log Base

The log base controls units. Natural log reports nats, base 2 reports bits, and base 10 reports hartleys. If you are evaluating compression or coding efficiency, bits are common. For statistical modeling and likelihood work, nats are often preferred. Changing base rescales results by a constant factor, preserving rankings.

Handling Zeros with Smoothing

If any q_i equals 0 while p_i > 0, the divergence becomes infinite because Q assigns impossible probability to an event P can produce. To reduce this, you may apply epsilon smoothing, adding a small ε to every entry before normalization. Typical values range from 1e-6 to 1e-3, depending on sample size.

Interpreting the Per‑Term Breakdown

The table shows p_i/q_i, log ratio, and the contribution p_i·log(p_i/q_i) for each index. Positive terms occur when P assigns higher probability than Q; negative terms occur when Q is higher. The Plotly graph visualizes P and Q side‑by‑side and overlays term contributions to highlight dominant mismatches. In A/B tests, report KL in bits per symbol; a drop from 0.050 to 0.020 bits can indicate meaningful calibration improvement. For monitoring, track weekly overall medians and the 95th percentile, and investigate categories with the largest per-term spikes quickly.

Related Metrics for Reporting

Alongside KL divergence, the calculator reports entropy H(P) and cross‑entropy H(P,Q). Cross‑entropy equals H(P) plus KL divergence, linking model mismatch to expected code length. Optionally, Jensen–Shannon divergence is provided; it is symmetric and finite when distributions are well‑defined, making it useful for dashboards and comparisons across many segments.

FAQs

What is relative entropy in simple terms?

It measures how inefficient it is to use distribution Q when the data actually follows P. Zero means the distributions match; larger values mean Q deviates more from P.

Can KL divergence be negative?

No. The total divergence is always zero or positive, although individual per-index terms can be negative when Q assigns higher probability than P at that index.

Why does the result become infinite?

If Q assigns zero probability to an event that P assigns a positive probability, the ratio pᵢ/qᵢ diverges. Smoothing or clipping can prevent division by zero for practical comparisons.

Should I normalize my inputs?

Yes when you enter counts, scores, or weights. Normalization converts them into probabilities that sum to one, making the divergence comparable across datasets and time windows.

Which log base should I choose?

Use base 2 for bits in coding and compression contexts, natural base for statistical modeling, and base 10 when reporting in decimal units. Changing base rescales values but does not change comparisons.

What is Jensen–Shannon divergence used for?

It is a symmetric, bounded alternative built from KL divergence against the average distribution. It is often preferred for clustering, similarity searches, and dashboards because it behaves well with noisy data.