F1 Score Calculator for Python Metrics

Calculator Inputs

Input mode

Model name

Positive label

True positives

False positives

False negatives

True negatives

Average method

Beta

Decision threshold

Decimal places

True labels

Predicted labels

Report notes

Formula Used

Precision equals true positives divided by predicted positives. Recall equals true positives divided by real positives. F1 is the harmonic mean of precision and recall.

Precision = TP / (TP + FP)

Recall = TP / (TP + FN)

F1 = 2 × Precision × Recall / (Precision + Recall)

F beta = (1 + beta²) × Precision × Recall / ((beta² × Precision) + Recall)

How to Use This Calculator

Choose confusion matrix counts when you already know TP, FP, FN, and TN.
Choose labels when you want the tool to compare true and predicted values.
Enter the positive label for binary scoring.
Select micro, macro, or weighted average for multiclass review.
Press Calculate to show the result above the form.
Use CSV or PDF to save the report.

Python Check Snippet

tp, fp, fn = 82, 9, 14
precision = tp / (tp + fp) if (tp + fp) else 0
recall = tp / (tp + fn) if (tp + fn) else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) else 0
print(round(f1, 4))

Example Data Table

Case	True Label	Predicted Label	Outcome for Positive Class 1
1	1	1	True positive
2	1	0	False negative
3	0	1	False positive
4	0	0	True negative
5	1	1	True positive

F1 Score Calculator for Python Model Review

An F1 score helps you inspect a classifier with one balanced number. It joins precision and recall into a harmonic mean. This is useful when accuracy hides important mistakes. A model can look accurate when the dataset has many easy negative cases. F1 focuses on the positive class, or on every class when averages are used.

Why F1 matters

Precision answers a direct question. Of the items predicted positive, how many were correct? Recall asks another question. Of the real positives, how many did the model find? F1 rewards models that keep both values strong. A high precision with weak recall can still produce a modest F1 score. A high recall with weak precision can do the same.

Working with counts

The binary count method uses true positives, false positives, false negatives, and true negatives. These values form a confusion matrix. The calculator checks each value, prevents invalid totals, and shows related measures. These include accuracy, specificity, error rate, and balanced accuracy. You can also enter beta to build a general F score. Beta above one gives more weight to recall. Beta below one gives more weight to precision.

Working with labels

You can paste true labels and predicted labels from Python output. Use comma separated values. The tool creates class level counts for each label. It then reports precision, recall, F1, and support. Micro average counts all decisions together. Macro average treats every class equally. Weighted average respects class support. These options match common model review needs.

Using the result

Do not judge a model by F1 alone. Compare it with business risk, class balance, and data quality. For fraud, recall may matter more. For spam filtering, precision may matter more. Check the class table before choosing a model. Look for weak classes and uneven support.

Downloads

The CSV file is useful for spreadsheets. The PDF file is useful for records. Both exports include the inputs and calculated values. Keep them with training notes. They make model comparisons easier.

Best practice

Use the same validation split for each comparison. Record the threshold used for predicted classes. Small threshold changes can shift precision and recall. Save each run, then compare scores with care.

FAQs

What is an F1 score?

F1 score is the harmonic mean of precision and recall. It helps judge classification quality when false positives and false negatives both matter.

When should I use F1 instead of accuracy?

Use F1 when classes are imbalanced or positive cases are important. Accuracy can look high when a model mostly predicts the larger class.

What is precision?

Precision shows how many predicted positives were actually positive. It is useful when false alarms are expensive or harmful.

What is recall?

Recall shows how many real positives were found. It is useful when missing a positive case creates risk or cost.

What does beta mean?

Beta changes the balance between precision and recall. Beta one gives the normal F1 score. Higher beta favors recall. Lower beta favors precision.

What is macro average?

Macro average calculates each class score, then averages them equally. It gives small classes the same weight as large classes.

What is weighted average?

Weighted average calculates each class score, then weights it by class support. Larger classes influence the final value more.

Can I export the result?

Yes. Use the CSV button for spreadsheet work. Use the PDF button for a simple saved report with inputs and metrics.