Calculator Input
Formula Used
J = Σj=1k Σxi ∈ Cj ||xi - μj||2
μj = (1 / nj) Σ xi
d(x, μ) = √[(x - μx)2 + (y - μy)2]
s(i) = [b(i) - a(i)] / max[a(i), b(i)]
The calculator minimizes total within-cluster variation. Inertia reports the summed squared distance from each point to its assigned centroid. A higher silhouette score often signals cleaner separation between clusters.
How to Use This Calculator
- Enter two-variable data in the dataset box.
- Choose the number of clusters you want.
- Select normalization for fairer feature comparison.
- Pick K-Means++ or random centroid initialization.
- Set iteration and tolerance controls.
- Submit the form to calculate clusters.
- Review the summary, centroid table, assignments, and graph.
- Export the output as CSV or PDF.
Example Data Table
| Point | X | Y | Likely Group |
|---|---|---|---|
| 1 | 1.0 | 1.1 | Lower-left cluster |
| 2 | 1.2 | 0.9 | Lower-left cluster |
| 3 | 0.8 | 1.0 | Lower-left cluster |
| 4 | 4.8 | 5.0 | Center cluster |
| 5 | 5.2 | 5.3 | Center cluster |
| 6 | 4.9 | 4.7 | Center cluster |
| 7 | 8.7 | 1.2 | Right cluster |
| 8 | 9.1 | 0.8 | Right cluster |
| 9 | 8.9 | 1.0 | Right cluster |
| 10 | 3.2 | 8.4 | Upper cluster |
| 11 | 3.5 | 8.0 | Upper cluster |
| 12 | 3.0 | 8.7 | Upper cluster |
FAQs
1. What does K Means clustering do?
It groups points into K clusters by minimizing within-cluster squared distance. Each point joins the nearest centroid, and centroids update until movement becomes very small.
2. Why should I normalize the data?
Normalization helps when one variable has a much larger scale. Without scaling, that variable can dominate distance calculations and distort cluster boundaries.
3. What is inertia in this calculator?
Inertia is the total sum of squared distances from each point to its assigned centroid. Lower values usually indicate tighter clusters, though the smallest value is not always the best model.
4. What does the silhouette score mean?
The silhouette score compares cohesion and separation. Values near 1 suggest cleaner clusters. Values near 0 indicate overlap. Negative values often mean poor assignments.
5. Why use K-Means++ initialization?
K-Means++ chooses smarter starting centroids. It usually reduces poor starting positions, improves consistency, and often reaches better solutions than plain random selection.
6. Can this calculator handle more than two variables?
This version focuses on two variables so the graph stays clear. The same clustering logic can be extended to higher dimensions with additional input handling.
7. How do I choose the best K value?
Test several K values and compare inertia, silhouette score, and visual separation. Choose the setting that balances compact clusters with meaningful structure.
8. Why can different runs produce different results?
K Means depends on starting centroids. Different seeds can change assignments. This page includes a seed field to make runs easier to reproduce.