Calculate tf-idf values from custom document sets easily. Adjust frequency modes, smoothing, and normalization controls. Visualize keyword importance with exports, formulas, examples, and guidance.
Enter one document per line. Add comma-separated terms, or leave terms blank to auto-pick the most frequent ones.
| Document ID | Example Document | Highlighted Terms |
|---|---|---|
| D1 | machine learning models rank text documents | machine, models, rank, text |
| D2 | text mining uses term weighting for search | text, weighting, search |
| D3 | ranking models use tf idf for relevance | models, tf, idf, relevance |
This sample works well for testing frequency, rarity, and normalized weighting behavior.
Raw Relative Frequency: TF(t,d) = count(t,d) / total_terms(d) Binary Frequency: TF(t,d) = 1 if count(t,d) > 0, else 0 Log Scaled Frequency: TF(t,d) = 1 + ln(count(t,d)) Augmented Frequency: TF(t,d) = 0.5 + 0.5 * count(t,d) / max_count(d)
Standard IDF: IDF(t) = ln(N / DF(t)) Smoothed IDF: IDF(t) = ln((1 + N) / (1 + DF(t))) + 1 Probabilistic IDF: IDF(t) = max(ln((N - DF(t)) / DF(t)), 0)
TF-IDF(t,d) = TF(t,d) * IDF(t) Optional Normalization L1: score / sum(|vector|) L2: score / sqrt(sum(score²))
TF highlights local importance inside a document. IDF reduces the weight of terms that appear in many documents. Their product surfaces words that are both frequent in one document and uncommon across the full corpus.
TF-IDF measures how important a term is within one document compared with its presence across the full document set. High scores usually signal terms that are descriptive and less common elsewhere.
A term appearing in many documents receives a lower IDF value. Even if it appears often inside one document, widespread usage reduces its ability to distinguish that document from others.
Smoothed IDF is useful when you want stable scores and fewer edge-case issues. It avoids zero-division problems and often works well for search, classification, and general text analysis pipelines.
L2 normalization scales each document vector to unit length. This makes documents more comparable when they differ greatly in size, which is helpful before clustering, cosine similarity, or nearest-neighbor analysis.
Yes. When the terms field is empty, the calculator automatically selects frequent terms from the supplied documents. The maximum term setting controls how many auto-selected terms are included.
Zero usually means the term is absent from that document, or the selected IDF method reduced the term’s weight to zero because it appears in nearly every document.
Yes. TF-IDF is a classic weighting method for information retrieval. It helps rank documents by emphasizing terms that are meaningful for a query while reducing overly common words.
Paste plain text, with one document on each line. For better results, keep the documents focused, use comparable topic scope, and provide clear keywords you want the calculator to evaluate.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.