Hierarchical Clustering Tool Calculator | Statistics Dataset Grouping

Enter Dataset and Options

Linkage Method

Distance Metric

Desired Clusters

Scaling

Dataset

Use comma, tab, semicolon, or pipe delimiters. Include a label column and one or more numeric feature columns.

Example Data Table

Label	Feature 1	Feature 2	Feature 3
A	1.0	1.4	0.8
B	1.2	1.6	1.1
D	5.2	5.5	5.0
G	8.8	2.0	7.5

This sample shows three separated patterns, making linkage behavior easier to compare.

Formula Used

Euclidean distance: d(a,b) = square root of the sum of squared coordinate differences.

Manhattan distance: d(a,b) = sum of absolute coordinate differences.

Chebyshev distance: d(a,b) = largest absolute coordinate difference.

Z-score standardization: z = (value minus mean) divided by standard deviation.

Single linkage: smallest point-to-point distance between two clusters.

Complete linkage: largest point-to-point distance between two clusters.

Average linkage: mean of all inter-cluster pair distances.

Centroid linkage: distance between cluster centroids.

Ward linkage: merge that produces the smallest increase in within-cluster variance.

How to Use This Calculator

Paste a labeled dataset with numeric feature columns.
Select a linkage method to control merge behavior.
Choose a distance metric for measuring similarity.
Set the desired number of clusters for the final cut.
Pick z-score scaling when features use different units.
Submit the form to generate assignments, profiles, merge history, and graphs.
Download the assignments as CSV or save the report as PDF.

Frequently Asked Questions

1. What does hierarchical clustering do?

It groups observations by repeatedly merging the most similar clusters. The full merge path helps you inspect structure at several cluster counts, not only one fixed answer.

2. When should I use z-score scaling?

Use scaling when features have different units or ranges. Without scaling, a large-range feature can dominate the distance calculation and distort group formation.

3. How do I choose a linkage method?

Single linkage favors chaining, complete linkage creates tighter groups, average linkage balances both, centroid tracks cluster centers, and Ward often forms compact clusters.

4. What does the silhouette score mean?

It estimates how well each observation fits its assigned cluster compared with nearby clusters. Higher values usually indicate cleaner separation and more stable grouping.

5. Why do Ward results ignore non-Euclidean distance?

Ward linkage is based on variance geometry, which aligns with Euclidean space. This tool automatically switches to Euclidean distance for that specific linkage to preserve consistency.

6. Can I cluster with only one feature?

Yes. The calculator still works with one numeric feature. The scatter chart then uses observation order on the horizontal axis and the feature values vertically.

7. What is the cut height?

Cut height is the merge distance at the step that produced your chosen cluster count. It helps compare how much dissimilarity was accepted before groups were combined.

8. What format should my dataset follow?

Include one observation per line, a label column, and numeric feature columns. Comma, tab, semicolon, and pipe delimiters are accepted in this version.