Calculator
Example Data Table
| Student | Item 1 | Item 2 | Item 3 | Item 4 | Item 5 |
|---|---|---|---|---|---|
| Key | A | B | C | D | A |
| Student 1 | A | B | C | D | A |
| Student 2 | A | C | C | D | A |
| Student 3 | B | B | D | A | A |
Formula Used
Item difficulty: p = number correct / number of students.
Wrong response share: q = 1 - p.
Discrimination index: D = upper group p - lower group p.
Corrected point-biserial: r = correlation between item score and total score minus that item.
KR-20 reliability: KR20 = k / (k - 1) × [1 - Σpq / total score variance].
Standard error: SEM = score standard deviation × √(1 - reliability).
How to Use This Calculator
Paste one student response row per line.
Use commas, spaces, or simple CSV rows.
Enter the answer key when using choice responses.
Leave the answer key blank for scored 0 and 1 data.
Adjust difficulty, discrimination, and item-total limits.
Press calculate and review the result above the form.
Use CSV or PDF export for reports and records.
Understanding Test Item Analysis
Test item analysis studies each question after students answer a test. It checks whether an item is too easy, too hard, or unclear. It also checks whether strong students do better on that item than weak students. Good items should separate prepared students from unprepared students.
Key Measures
Difficulty is the share of students who answer correctly. A value near 0 means the item is hard. A value near 1 means the item is easy. Many classroom tests work well when most items sit between 0.30 and 0.85. Discrimination compares high scoring students with low scoring students. A positive value means the item supports the total test score. A negative value needs urgent review.
Item Total Correlation
Point biserial correlation links one item score with the remaining test score. This corrected form avoids giving the item credit for itself. Higher positive values are better. Low values may show vague wording, a wrong key, guessing, or mixed content. Always inspect the item before deleting it. Numbers guide review, but content judgment remains important.
Reliability and Error
KR 20 estimates internal consistency for dichotomous scoring. It rises when items measure a common skill and totals vary well. A low value may come from few items, poor items, or mixed topics. Standard error of measurement converts reliability into score uncertainty. It helps explain how close a student score may be to a true score.
Distractor Review
When answer choices are supplied, distractor analysis becomes useful. A strong distractor attracts some lower scoring students. It should not attract many high scoring students. A distractor nobody chooses may be weak. A correct answer chosen by few strong students may signal poor wording or a keying problem.
Practical Use
Use this calculator after scoring a quiz, exam, or practice test. Paste one student per row. Use a key for choice data. Leave the key blank for zero one data. Sort by flags first. Review hard items with negative discrimination. Keep strong items for future forms. Revise weak distractors. Document every change before reporting final results. Compare results with syllabus aims. Never remove items only to improve statistics. Fair tests balance measurement, teaching goals, and student understanding across cohorts carefully.
FAQs
What is test item analysis?
It is a review of individual test questions. It checks difficulty, discrimination, item-total relation, reliability, and response patterns. It helps teachers improve tests and identify weak questions.
What is a good difficulty value?
Many tests use items between 0.30 and 0.85. The best range depends on purpose. Placement tests may need harder items. Mastery checks may accept easier items.
What does negative discrimination mean?
Negative discrimination means lower scoring students did better than higher scoring students on that item. Check the answer key, wording, content match, and possible ambiguity.
Why use corrected point-biserial?
Corrected point-biserial removes the item from the total score before correlation. This avoids inflating the relationship and gives a cleaner item-total quality signal.
Can I use raw multiple-choice responses?
Yes. Paste the student choices and enter an answer key. The calculator scores each response, then calculates difficulty, discrimination, reliability, and distractor information.
Can I use scored zero-one data?
Yes. Leave the answer key blank. Enter one row per student. Use 1 for correct answers and 0 for wrong or missing answers.
What is KR-20 reliability?
KR-20 estimates internal consistency for tests with right-or-wrong scoring. Higher values suggest items work together better, but content coverage should still be checked.
Should I delete every flagged item?
No. Flags show items needing review. Check the learning objective, wording, key, distractors, and teaching context before removing or revising any question.