Data Quality Score Calculator

Calculator Inputs

Enter dataset defect counts and assign weights for each quality dimension. Results appear above this form after submission.

Total Records

Missing Values

Duplicate Records

Invalid Values

Outlier Records

Stale Records

Inconsistent Records

Verified Accuracy Rate (%)

Completeness Weight

Uniqueness Weight

Validity Weight

Consistency Weight

Timeliness Weight

Accuracy Weight

Example Data Table

Dataset Batch	Total Records	Missing	Duplicates	Invalid	Outliers	Stale	Inconsistent	Accuracy %
Customer Churn Set A	10000	320	140	210	175	260	190	96.4
Fraud Events Set B	18500	510	220	340	420	380	275	94.8
IoT Sensor Set C	25000	890	160	430	690	740	310	92.6

Formula Used

Completeness Score = (1 - Missing Values / Total Records) × 100

Uniqueness Score = (1 - Duplicate Records / Total Records) × 100

Validity Score = (1 - Invalid Values / Total Records) × 100

Consistency Score = (1 - Inconsistent Records / Total Records) × 100

Timeliness Score = (1 - Stale Records / Total Records) × 100

Outlier Penalty Score = (1 - Outlier Records / Total Records) × 100

Accuracy Composite = (Verified Accuracy × 0.7) + (Outlier Penalty Score × 0.3)

Overall Score = Σ(Dimension Score × Weight) / Σ(Weights)

The calculator combines six core data quality dimensions into a weighted index. Higher weights emphasize the dimensions that matter most for your machine learning pipeline, monitoring policy, or deployment risk profile.

How to Use This Calculator

Enter the total record count for the dataset sample.
Add observed defect counts for missing, duplicate, invalid, stale, inconsistent, and outlier records.
Enter a verified accuracy rate from audits, labels, or trusted reference checks.
Assign weights to each quality dimension based on project priorities.
Press Calculate Score to display the result above the form.
Use the CSV or PDF buttons to export the current result snapshot.

Frequently Asked Questions

1. What does a data quality score represent?

It summarizes how trustworthy a dataset is across key dimensions like completeness, validity, uniqueness, consistency, timeliness, and accuracy. Higher scores usually mean lower risk for training, inference, and monitoring outcomes.

2. Why are weights included?

Weights let you prioritize the dimensions most important to your use case. For example, fraud detection may emphasize timeliness and accuracy, while reporting systems may care more about completeness and consistency.

3. How is accuracy different from validity?

Validity checks whether values fit accepted formats or rules. Accuracy measures whether records match the real world or trusted references. A value can be valid in format but still inaccurate.

4. Why do outliers affect the accuracy composite?

Extreme outliers often indicate noisy capture, labeling mistakes, broken sensors, or integration issues. Including an outlier penalty helps the score reflect unusual values that can distort training quality and model stability.

5. What score is considered good?

A score above 85 is typically strong for many workflows. Scores between 70 and 85 usually need targeted cleanup. Anything lower may introduce training bias, instability, or unreliable evaluation metrics.

6. Can I use sample data instead of the full dataset?

Yes. A representative sample is often practical for audits. Just make sure the sample covers important classes, time periods, and sources so the score reflects actual production data conditions.

7. Is this score useful for ongoing monitoring?

Yes. Repeating this calculation on fresh batches helps you detect drift, pipeline failures, schema changes, stale records, or rising duplicates before those problems damage model performance.

8. Does this replace a full data audit?

No. It is a compact decision aid, not a full governance framework. Pair it with profiling, lineage checks, class balance reviews, bias analysis, and feature-level validation for stronger assurance.