Enter Data
Example Data Table
| Point | X | Y | Suggested Group |
|---|---|---|---|
| P1 | 2 | 3 | Low left |
| P2 | 3 | 4 | Low left |
| P3 | 3 | 5 | Low left |
| P4 | 8 | 7 | Center |
| P5 | 8 | 8 | Center |
| P6 | 9 | 8 | Center |
| P7 | 15 | 3 | Right |
| P8 | 16 | 4 | Right |
| P9 | 15 | 5 | Right |
Formula Used
K medoids minimizes the total distance from each point to its assigned medoid. A medoid is a real observation, unlike a centroid.
Where xᵢ is a data point, mⱼ is a medoid, and d is the chosen distance metric.
For each cluster, the calculator tests candidate members and keeps the point with the smallest total within cluster distance as the new medoid.
How to Use This Calculator
- Paste one data point per line in the dataset box.
- Separate dimensions with commas, such as 10,15,22.
- Enter the number of desired clusters.
- Select Euclidean or Manhattan distance.
- Choose whether to standardize the variables first.
- Set the maximum iterations and decimal display.
- Press the calculate button to generate results.
- Review medoids, cluster costs, assignments, and silhouette score.
- Use the export buttons to save CSV or PDF output.
Why Analysts Use K Medoids
K medoids is useful when you want representative cluster centers that are actual observations. That makes results easier to explain in business, healthcare, operations, and survey analysis.
It is also more resilient to extreme values than methods based on mean centers. When data contains outliers, medoids often create more interpretable clusters.
Frequently Asked Questions
1. What does this calculator return?
It returns the selected medoids, point assignments, total clustering cost, per cluster cost, number of iterations, and silhouette score when multiple clusters exist.
2. What is the difference between k medoids and k means?
K medoids uses actual data points as centers. K means uses arithmetic averages, which can be more sensitive to unusual values and extreme observations.
3. When should I standardize the variables?
Standardize when one column has much larger numeric values than others. It prevents large scale variables from dominating the distance calculation.
4. Which distance metric should I choose?
Use Euclidean for straight line distance. Use Manhattan when movement is grid based or when you want absolute differences to drive clustering.
5. Can this calculator handle multidimensional points?
Yes. Every row can include two or more numeric dimensions, as long as all rows contain the same number of comma separated values.
6. What does total cost mean?
Total cost is the sum of each point’s distance to its assigned medoid. Lower values usually indicate tighter and more compact clusters.
7. Why is silhouette score helpful?
Silhouette score compares how well each point fits its own cluster against nearby clusters. Higher values generally suggest clearer separation.
8. Is this method suitable for outlier heavy data?
Often yes. Because medoids are actual observations, they can be more stable than mean based centers when unusual values appear in the dataset.