Enter Text Details
Example Data Table
| Sample Text | Total Tokens | Unique Types | TTR | Meaning |
|---|---|---|---|---|
| Data science uses data models and data tools. | 8 | 6 | 0.750 | Moderate variety |
| Red red red red blue. | 5 | 2 | 0.400 | Low variety |
| Every sentence adds fresh terms for clear study. | 8 | 8 | 1.000 | High variety |
Formula Used
Type Token Ratio: TTR = Number of unique types / Total number of tokens
Percentage: TTR% = TTR × 100
Corrected TTR: CTTR = Types / √(2 × Tokens)
Root TTR: RTTR = Types / √Tokens
Herdan C: C = log(Types) / log(Tokens)
How to Use This Calculator
Paste your text in the large input box. Select the token mode. Add stop words if needed. Choose whether punctuation, numbers, case, or simple stemming should affect the result. Press the calculate button. The result appears above the form. Review the chart, table, and exported files.
Understanding Type Token Ratio
What It Measures
Type token ratio measures lexical diversity. It compares unique words with all counted words. A higher value means more variety. A lower value means more repetition. This makes the score useful for writing, linguistics, education, and content review.
Why It Matters
Writers can use this score to check word choice. Teachers can compare student texts. Researchers can study vocabulary richness. Editors can spot repetitive wording. The result is simple, but it gives a quick view of text variety.
How Tokens and Types Work
A token is every counted word occurrence. A type is a unique word form. For example, in “cat cat dog,” there are three tokens. There are two types. The type token ratio is two divided by three.
Limits of Raw TTR
Raw TTR is sensitive to text length. Short texts often score higher. Long texts usually repeat more words. That can reduce the ratio. This calculator also includes corrected values. These values help compare longer and shorter samples more fairly.
Advanced Options
Case sensitivity changes how words are matched. Removing punctuation gives cleaner word counts. Removing numbers can help with prose. Stop words remove common words from analysis. Simple stemming groups related word forms. These settings make the result more flexible.
Best Practice
Use similar settings when comparing texts. Compare samples with close lengths. Keep punctuation rules consistent. Save the CSV for records. Use the chart to see repeated words. Review the frequency table before making final conclusions.
FAQs
1. What is a type token ratio?
It is a lexical diversity score. It divides unique words by total word occurrences. Higher scores show more varied vocabulary.
2. What is a token?
A token is each counted word occurrence. If one word appears five times, it adds five tokens to the total.
3. What is a type?
A type is a unique word form. Repeated words count once as a type but many times as tokens.
4. Is a higher TTR always better?
Not always. A high score shows variety, but clear writing may still need repeated key terms for meaning.
5. Why does text length affect TTR?
Longer texts naturally repeat more words. That often lowers raw TTR, even when the writing is strong.
6. What does corrected TTR do?
Corrected TTR adjusts the score for token count. It helps compare texts with different lengths more fairly.
7. Should I remove stop words?
Remove them when studying content words. Keep them when analyzing full grammar, style, or natural repetition.
8. Can I export the result?
Yes. Use the CSV button for spreadsheet data. Use the PDF button for a simple report.