Calculator Input
Example data table
| Point | X | Y | Visible region |
|---|---|---|---|
| 1 | 1.0 | 1.2 | Low cluster |
| 2 | 1.5 | 1.8 | Low cluster |
| 3 | 2.0 | 1.0 | Low cluster |
| 4 | 2.2 | 1.6 | Low cluster |
| 5 | 4.6 | 5.0 | Middle cluster |
| 6 | 5.0 | 4.5 | Middle cluster |
| 7 | 5.3 | 5.4 | Middle cluster |
| 8 | 5.8 | 4.9 | Middle cluster |
| 9 | 7.0 | 8.0 | High cluster |
| 10 | 7.5 | 8.3 | High cluster |
| 11 | 8.0 | 7.8 | High cluster |
| 12 | 8.4 | 8.6 | High cluster |
Formula used
Objective function: K Means minimizes the within-cluster sum of squares.
J = Σ Σ ||xᵢ - μⱼ||²
Each point xᵢ is assigned to the nearest centroid μⱼ.
Cluster(xᵢ) = arg minⱼ ||xᵢ - μⱼ||²
After assignments, each centroid becomes the arithmetic mean of all points in that cluster.
μⱼ = (1 / nⱼ) Σ xᵢ
Inertia: The calculator reports total squared distance across all assigned points.
Silhouette score: When the dataset is not too large, the calculator estimates separation quality using average intra-cluster and nearest-cluster distances.
How to use this calculator
- Enter one two-dimensional point per line in the data area.
- Choose the number of clusters, iteration limit, tolerance, and run count.
- Select an initialization method and decide whether to standardize the variables.
- Set a random seed when you want repeatable runs.
- Click Run Clustering to generate centroids, assignments, and the interactive chart.
- Review inertia, silhouette, centroid positions, and iteration history before exporting CSV or PDF.
Frequently asked questions
1. What does K Means clustering do?
It groups similar numeric points into K clusters by repeatedly assigning points to the nearest centroid and recalculating the centroid as the cluster mean.
2. How do I choose the right K value?
Try several values and compare inertia, silhouette, and visual separation. The best K usually balances lower compactness error with meaningful, stable group structure.
3. Why does initialization matter?
Different starting centroids can lead to different local solutions. K Means++ usually improves stability by spreading initial centroids before iterations begin.
4. What is inertia in this calculator?
Inertia is the total within-cluster squared distance. Lower values indicate tighter clusters, but inertia alone should not determine the final K choice.
5. Should I standardize my variables?
Standardization is helpful when X and Y use different scales. It prevents larger-scale variables from dominating distance calculations and cluster placement.
6. Why can results change between runs?
K Means can start from different centroid seeds. Changing seeds may alter assignments, especially when clusters overlap or the dataset contains borderline points.
7. Can I use this tool for non-spherical clusters?
It works best for compact, mean-centered groups. Strongly curved, uneven-density, or highly elongated structures may require other clustering approaches.
8. What does the silhouette score indicate?
The silhouette score summarizes how close points are to their own cluster compared with neighboring clusters. Higher values usually indicate better separation.