Model Explanation Rate Calculator

Calculator inputs

Enter operational and quality signals for explainability output.

Reset

Total predictions (N)

All predictions in the evaluated window.

Explained predictions (E)

Predictions that received an explanation artifact.

Avg explanation time (seconds)

Average compute time per explanation.

Fidelity score (0–1)

How closely explanations match model behavior.

Stability score (0–1)

Consistency under small input perturbations.

Human rating (1–5)

Reviewer clarity and usefulness score.

Avg features used in explanation

Typical number of features surfaced per explanation.

Total model features

Feature count after preprocessing.

Work hours per day

Used to estimate daily explanation capacity.

Advanced settings

Adjust weights, SLA, and complexity penalty behavior.

Optional

SLA time (seconds)

Efficiency factor = min(1, SLA / avg time).

Complexity penalty weight (0–1)

Penalty = weight × (features used / total features).

Quality weights (any non‑negative)

F S H

Weights are normalized automatically before scoring.

Example data table

A sample configuration and its resulting metrics.

Scenario	Total predictions	Explained	Avg time (s)	Fidelity	Stability	Human	Avg features	Total features	SLA (s)
Credit risk batch	50,000	42,000	12	0.87	0.80	4.2	9	30	15
Fraud realtime	120,000	84,000	6	0.82	0.76	3.9	7	22	8
Churn analysis	10,000	10,000	20	0.90	0.86	4.6	6	18	25

Use the “Load example” button to auto-fill the first scenario.

Formula used

Definitions are designed for operational explainability reporting.

Core metrics

Coverage = E / N
Throughput = 3600 / avg_time_sec
HumanScaled = human_rating / 5
Quality = wF·fidelity + wS·stability + wH·HumanScaled

Adjustments

ComplexRatio = avg_features_used / total_features
ComplexPenalty = min(0.95, penalty_weight · ComplexRatio)
Efficiency = min(1, SLA_time_sec / avg_time_sec)
MER/hour = Throughput · Coverage · Quality · (1−ComplexPenalty) · Efficiency

Weights (wF, wS, wH) are normalized so they sum to 1.

How to use this calculator

A practical workflow for teams shipping explainable systems.

Choose an evaluation window and count total predictions (N).
Count explained predictions (E) where an explanation was produced.
Measure average explanation runtime per instance in seconds.
Estimate quality signals: fidelity, stability, and human review rating.
Enter explanation complexity using average features shown versus total features.
Optionally adjust SLA and weights to match governance priorities.

Coverage and governance reporting

Model explanation coverage links explainability efforts to measurable audit readiness. When E equals 42,000 out of N equals 50,000, coverage is 84.0%. Many teams set a minimum coverage floor, such as 70%, for regulated decisions and 95% for high impact segments. Track coverage by region, product, and risk tier to avoid blind spots.

Quality signals that align with validation

Fidelity, stability, and human review each represent different failure modes. A fidelity of 0.87 indicates explanations mirror model behavior under a proxy test. A stability of 0.80 means small perturbations preserve ranked features most of the time. A human score of 4.2/5 captures readability for analysts and customers. Normalize weights to prioritize what regulators, customers, or internal policy value. Use weights 0.45, 0.35, 0.20 for balance.

Speed, SLA, and deployment constraints

Throughput is 3600 divided by average explanation time. At 12 seconds, baseline throughput is 300 explanations per hour. The SLA efficiency factor caps performance when runtime exceeds your target. If SLA is 15 seconds, efficiency is 1.0; if SLA is 8 seconds, efficiency becomes 0.67. This encourages caching, batching, and tiered explainers in production.

Complexity penalties and feature budgeting

Explanations that list too many features reduce comprehension and slow reviews. Complexity ratio is average features used divided by total features. With 9 of 30 features, ratio is 30%. A penalty weight of 0.30 produces a 9% penalty, leaving a 0.91 multiplier. Use this lever to push concise explanations, for example limiting outputs to top 5 to 10 features. If ratio exceeds 50%, reviewers often report cognitive overload and inconsistent decisions.

Using MER Index for release decisions

MER Index combines coverage, quality, complexity, and SLA into a comparable 0 to 100 score. Treat 70+ as strong, 45–69 as good, and 25–44 as developing. Compare the index across model versions, then validate the drivers: raising coverage might increase cost, while raising fidelity may require better surrogate models. Pair MER Index with MER per hour to plan compute and staffing. Publish the index alongside drift metrics and incident tickets to explain sudden drops.

FAQs

1) What does MER per hour represent?

MER per hour is your explanation throughput adjusted for coverage, quality, complexity, and SLA efficiency. It helps size compute and reviewer capacity for a specific model and time window.

2) Why separate coverage from throughput?

Throughput reflects speed, while coverage reflects whether explanations exist for the decisions that matter. A fast system with low coverage still fails audits, and full coverage with slow explanations may breach operational SLAs.

3) How should I choose the SLA time?

Use the maximum explanation latency your product can tolerate, such as the p95 budget for realtime decisions or a batch window for offline scoring. Set different SLAs for tiers, then evaluate each tier separately.

4) What is a reasonable complexity penalty weight?

Start between 0.20 and 0.40. Increase it if reviewers complain about long feature lists, or decrease it when explanations are already concise. Keep the resulting penalty below about 20% so the score remains interpretable.

5) How do I estimate fidelity and stability?

Fidelity can come from surrogate model accuracy, perturbation tests, or agreement with counterfactual checks. Stability can be measured by repeating explanations under small noise and scoring feature-rank overlap, such as top‑k Jaccard similarity.

6) Can I compare MER Index across models?

Yes, when you use consistent windows, scoring scales, and weights. Compare index deltas release to release, then inspect the breakdown to see if changes came from coverage, runtime, quality, or complexity.

Coverage and governance reporting

Quality signals that align with validation

Speed, SLA, and deployment constraints

Complexity penalties and feature budgeting

Using MER Index for release decisions

Related Calculators