Cross Validation AUC Calculator

Advanced Calculator Inputs

Enter fold AUC values, optional weights, and target assumptions to assess summary performance and validation stability.

Model Name

Confidence Level

Round Digits

Baseline AUC

Target AUC

Stability Threshold

Fold AUC Values

Use values between 0 and 1. Separate with lines, spaces, commas, or semicolons.

Fold Weights

Weights can represent validation sample sizes or class importance. Leave blank for unweighted calculations.

Example Data Table

This sample shows how fold-level AUC values and fold weights can be organized before running the calculator.

Fold	AUC	Weight	Validation Notes
1	0.91	120	Balanced stratified split
2	0.88	110	Slightly harder holdout
3	0.93	125	Highest discrimination
4	0.90	118	Stable threshold behavior
5	0.89	115	Moderate score overlap

Formula Used

Mean AUC: Mean AUC = (Σ AUC_i) / k, where k is the number of folds.

Weighted Mean AUC: Weighted Mean = (Σ w_i × AUC_i) / (Σ w_i), useful when folds have different validation sizes.

Sample Variance: s² = Σ(AUC_i − Mean)² / (k − 1).

Standard Error: SE = s / √k. This estimates uncertainty around the fold summary.

Confidence Interval: Weighted Mean ± z × SE, clipped to the valid AUC range of 0 to 1.

Coefficient of Variation: CV% = (Standard Deviation / Weighted Mean) × 100, which expresses relative dispersion.

Consistency Score: This tool maps low dispersion to a higher 0–100 score using your chosen stability threshold.

How to Use This Calculator

Add a model name to identify the evaluation run.
Paste each fold AUC into the fold values box.
Optionally add one weight for each fold.
Select a confidence level for the interval estimate.
Enter a baseline AUC and a target AUC.
Set a stability threshold that matches your tolerance.
Choose how many digits to display.
Press Calculate AUC Summary to show the result above the form.
Use the CSV and PDF buttons to export the computed metrics.

Why Cross Validation AUC Matters

Cross validation AUC summarizes ranking quality across repeated validation splits. A single holdout can look excellent by chance, while repeated folds reveal whether discrimination remains stable. Adding interval and variability statistics helps compare models with more discipline.

This calculator supports weighted aggregation when folds represent different sample sizes. That is useful in grouped, time-based, or stratified workflows where validation partitions are not identical. Comparing weighted mean AUC with baseline and target values helps decide whether performance is both strong and dependable.

Dispersion measures also matter. Two models can share the same mean AUC but differ sharply in fold volatility. Lower variance and narrower intervals usually indicate a model that will generalize more predictably when deployed.

FAQs

1. What does AUC measure in classification?

AUC measures how well a model ranks positive cases above negative cases across thresholds. Higher values indicate stronger discrimination between the two classes.

2. Why use cross validation instead of one split?

Cross validation reduces dependence on one lucky or unlucky split. It reveals average discrimination and stability across repeated validation partitions.

3. When should I use fold weights?

Use weights when folds represent different validation sizes or different importance. Weighted means keep larger or more relevant folds from being treated equally by mistake.

4. What is a good standard deviation for fold AUC?

There is no universal cutoff. Smaller deviation usually means more stable validation behavior. Many teams compare it against internal tolerance thresholds or peer models.

5. Why can the confidence interval be wide?

Intervals widen when fold performance is inconsistent or when the number of folds is small. Wider intervals suggest more uncertainty around the summary AUC.

6. Can this calculator handle multiclass evaluation?

Yes, if your fold AUC values already represent a chosen multiclass averaging method. Keep the same averaging method across all folds for consistency.

7. What does the consistency score mean?

The consistency score converts fold dispersion into a simple 0–100 scale. Higher values mean your AUC values stay closer together across folds.

8. Should I prefer mean AUC or weighted mean AUC?

Use mean AUC when folds are comparable. Use weighted mean AUC when fold sizes differ materially or when some validation partitions deserve stronger influence.