Micro Average F1 Calculator

Calculator

Responsive inputs: 3 columns large, 2 columns small, 1 column mobile.

Input Mode *

Choose the format that matches your evaluation output.

Beta (for Micro Fβ)

Use β>1 to emphasize recall; β<1 to emphasize precision.

Epsilon (smoothing)

Adds a small value to avoid division by zero.

Totals

True Positives (TP)

False Positives (FP)

False Negatives (FN)

True Negatives (TN) (optional)

Only used for accuracy here.

Per-class counts

Micro-averaging sums TP, FP, and FN across all rows.

Class	TP	FP	FN	Action

Multi-label tip: If each label is evaluated independently, per-label (binary) counts are a natural fit for micro-averaging.

Clear

Example Data Table

Sample per-class counts show how micro-averaging aggregates decisions across classes.

Class	TP	FP	FN
Positive	120	18	22
Neutral	75	20	15
Negative	60	12	25
Total (micro)	255	50	62

Micro F1 (from totals): 2·TP / (2·TP + FP + FN) = 2·255 / (510 + 50 + 62) ≈ 0.820257

Formula Used

Micro Precision aggregates over all classes:

P_micro = TP / (TP + FP + ε)

Micro Recall aggregates over all classes:

R_micro = TP / (TP + FN + ε)

Micro F1 can be computed directly from totals:

F1_micro = 2·TP / (2·TP + FP + FN + ε)

Micro Fβ (optional) changes the precision/recall tradeoff:

Fβ_micro = (1+β²)·TP / ((1+β²)·TP + β²·FN + FP + ε)

Micro-averaging emphasizes overall decision quality, making it robust under class imbalance when you care about global performance.

How to Use This Calculator

Select an input mode that matches your evaluation output.
Enter counts for TP, FP, FN (and TN when you have it).
Optionally set β and ε for advanced scoring.
Press Calculate to view metrics above the form.
Use Download CSV or Download PDF to export results.

FAQs

1) What does micro-averaged F1 measure?

It measures global balance between precision and recall by summing TP, FP, and FN across classes or labels before computing F1.

2) When should I use micro F1 instead of macro F1?

Use micro F1 when overall performance matters more than equal weighting per class, especially with strong class imbalance or many rare labels.

3) Why can micro F1 equal accuracy for multiclass?

In single-label multiclass, every mistake creates one FP and one FN globally. That makes micro F1 simplify to TP divided by total samples.

4) How do I get TP, FP, and FN for multilabel tasks?

Compute binary counts per label, then sum TP, FP, and FN across labels. The per-class mode is designed for this workflow.

5) What does beta change in Fβ?

β controls how much recall matters relative to precision. β>1 favors recall, β<1 favors precision, and β=1 gives standard F1.

6) What is epsilon smoothing used for?

Epsilon adds a tiny value to denominators to prevent division by zero when counts are missing or very small, stabilizing calculations.

7) Can I use fractional counts?

Yes. If you have weighted samples or probabilistic counting, fractional values still work because formulas operate on totals.

8) What should I do if all counts are zero?

The metrics are undefined in practice, but this calculator returns zeros. Provide real evaluation counts or apply smoothing carefully.