Macro Average F1 Calculator

Analyze every class before trusting summary metrics. Find weak labels, skew, and missed positives quickly. Use exports and charts for cleaner model review workflows.

Enter Class Statistics

Use one card per class. Large screens show three cards, medium screens show two, and small screens show one.

Class 1

Class 2

Class 3

Class 4

Performance Chart

The chart compares per-class precision, recall, and F1. Submit the form to refresh values.

Example Data Table

Class TP FP FN Support Precision Recall F1 Score
Cat 42 6 8 50 0.8750 0.8400 0.8571
Dog 37 9 11 48 0.8043 0.7708 0.7872
Bird 29 5 7 36 0.8529 0.8056 0.8286
Fish 24 4 6 30 0.8571 0.8000 0.8276
Macro Average F1 0.8251

Formula Used

Precision for each class
Precision = TP / (TP + FP)
Recall for each class
Recall = TP / (TP + FN)
F1 score for each class
F1 = 2 × Precision × Recall / (Precision + Recall)
Macro average F1
Macro F1 = (F1₁ + F1₂ + ... + F1ₙ) / n

Macro averaging gives every class equal weight, even when support differs. This helps expose weak performance on rare classes that accuracy may hide.

How to Use This Calculator

  1. Add one row card for each class in your model.
  2. Enter the class name and its TP, FP, and FN values.
  3. Choose how zero-division cases should behave.
  4. Select the number of displayed decimals.
  5. Press Calculate Macro Average F1.
  6. Read the summary above the form and review the table.
  7. Use the chart to compare class-level precision, recall, and F1.
  8. Export the results as CSV or PDF for documentation.

Frequently Asked Questions

1. What does macro average F1 measure?

It averages the F1 score of every class equally. Large classes do not dominate the result, so weak minority-class performance remains visible.

2. When should I prefer macro F1 over accuracy?

Use macro F1 when class imbalance matters or when every class deserves equal attention. Accuracy can look strong even if rare classes perform badly.

3. How is macro F1 different from weighted F1?

Macro F1 treats every class equally. Weighted F1 multiplies each class F1 by support, so common classes influence the overall score more.

4. Why can macro F1 be lower than micro F1?

Micro F1 aggregates totals across classes first. Strong performance on common classes can raise micro F1, while macro F1 still punishes weak classes equally.

5. What inputs do I need for each class?

You need true positives, false positives, and false negatives. The calculator derives support, precision, recall, and F1 automatically from those values.

6. What happens when a denominator becomes zero?

The calculator applies your chosen zero-division policy. You can return 0 or 1 when TP+FP, TP+FN, or precision+recall equals zero.

7. Can I use this for multiclass and multilabel reviews?

Yes, as long as your TP, FP, and FN values are valid for each class. The macro F1 formula is identical once those class-level statistics exist.

8. Why is support shown beside each class?

Support equals TP + FN. It helps you judge how much evidence backs each class metric and how imbalance may affect interpretation.

Related Calculators

precision recall tablefraud detection metricsmicro average f1precision recall metricsroc precision recallmodel validation metricsclassifier performance metrics

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.