Agglomerative Clustering Calculator

Calculator

Overall page uses a single column layout. The calculator inputs below switch to 3, 2, and 1 columns responsively.

Dataset in CSV format

Target clusters

Linkage method

Distance metric

Displayed decimals

Standardize features

First row contains headers

First column contains labels

Example data table

Label	X	Y
A	1.0	1.2
B	1.4	1.0
C	0.8	1.5
D	4.1	4.8
E	4.4	4.3
F	4.0	4.6
G	7.8	8.2
H	8.4	8.0
I	7.5	7.6

Formula used

Agglomerative clustering starts with one cluster per observation. It repeatedly merges the closest two clusters until the requested number of clusters remains.

Distance metrics

Euclidean distance: d(a,b) = √Σ(aᵢ - bᵢ)²

Manhattan distance: d(a,b) = Σ|aᵢ - bᵢ|

Linkage rules

Single linkage: smallest pairwise distance between two clusters.

Complete linkage: largest pairwise distance between two clusters.

Average linkage: mean pairwise distance between cluster members.

Centroid linkage: distance between cluster centroids.

Ward linkage: merge cost that minimizes added within-cluster variance.

Standardization

When enabled, each feature is transformed using z = (x - mean) / standard deviation. This prevents larger-scale variables from dominating distance calculations.

Quality indicators

Within-cluster SSE: total squared distance from points to their cluster centroids.

Average silhouette: compares cohesion against separation. Values near 1 indicate cleaner clusters.

How to use this calculator

Paste your dataset into the CSV textarea.
Enable the header option if your first row contains names.
Enable the label option if your first column contains row labels.
Select your target number of clusters.
Choose a linkage method and distance metric.
Turn on standardization when features have very different scales.
Submit the form to calculate assignments, merge history, and charts.
Use the CSV and PDF buttons to export the result section.

FAQs

1. What does agglomerative clustering do?

It begins with one point per cluster and merges the closest clusters step by step. The process continues until the chosen number of clusters remains.

2. When should I standardize features?

Standardize when variables use different scales, such as income and percentages. Otherwise, large numeric ranges can dominate the distance calculation.

3. How do I choose the number of clusters?

Look for a jump in the merge distance chart. A sharp increase often means later merges combine groups that were previously well separated.

4. What is the difference between linkage methods?

Single linkage favors chained shapes, complete linkage prefers compact groups, average linkage balances both, centroid uses means, and Ward minimizes added variance.

5. Can I use more than two features?

Yes. The clustering engine supports multiple numeric features. The scatter chart displays only the first two for a simple visual summary.

6. What does silhouette score tell me?

It measures how close each point is to its own cluster compared with the nearest alternative cluster. Higher values usually indicate clearer separation.

7. Why might Ward give different scale values?

Ward uses a variance-based merge cost, not a simple raw distance. Its values are most useful for comparing merge steps within the same run.

8. What format should my CSV follow?

Keep each row the same length. Use optional labels in the first column, then place only numeric feature values in the remaining columns.

Label	X	Y
A	1.0	1.2
B	1.4	1.0
C	0.8	1.5
D	4.1	4.8
E	4.4	4.3
F	4.0	4.6
G	7.8	8.2
H	8.4	8.0
I	7.5	7.6

Label	X	Y
A	1.0	1.2
B	1.4	1.0
C	0.8	1.5
D	4.1	4.8
E	4.4	4.3
F	4.0	4.6
G	7.8	8.2
H	8.4	8.0
I	7.5	7.6