Complete Linkage Calculator

Calculator Input

Use one row per observation in the format: Label, x1, x2, x3.

Dataset

Distance Metric Target Number of Clusters

Standardize variables before clustering

Input Notes

Use consistent dimensions on every row.

Labels may be letters, IDs, or names.

Standardization is useful for mixed scales.

Complete linkage uses the farthest inter-cluster pair.

Example Data Table

This sample shows six observations with two variables.

Label	Variable 1	Variable 2
A	2	3
B	3	4
C	8	7
D	9	8
E	3	2
F	8	9

Formula Used

Complete linkage defines the distance between two clusters as the maximum distance between any point in cluster A and any point in cluster B.

Complete Linkage Distance:

D(A,B) = max { d(i,j) } for all points i ∈ A and j ∈ B.

Euclidean distance: d(i,j) = √Σ(xik - xjk)²

Manhattan distance: d(i,j) = Σ|xik - xjk|

Chebyshev distance: d(i,j) = max|xik - xjk|

The algorithm begins with each observation as its own cluster, then merges the pair with the smallest complete linkage distance until the requested cluster count is reached.

How to Use This Calculator

Enter one observation per line using a label and numeric coordinates.
Select a distance metric that matches your analysis goal.
Choose the final number of clusters you want.
Enable standardization when variables use different scales.
Click the calculate button to compute cluster merges.
Review the final clusters, merge history, and matrix.
Use the graph to inspect merge distance growth.
Export results as CSV or PDF for reporting.

FAQs

1. What does complete linkage measure?

It measures the farthest pairwise distance between two clusters. This creates compact clusters and avoids merging groups when any two members are very far apart.

2. When should I standardize variables?

Standardize when variables use different units or scales. Without standardization, large-scale variables can dominate the distance calculation and distort clustering results.

3. Which distance metric should I choose?

Euclidean works well for geometric similarity. Manhattan is useful for grid-like movement. Chebyshev fits problems where the largest coordinate gap matters most.

4. Why can complete linkage create tighter clusters?

Because it uses the maximum inter-cluster distance, it penalizes wide or stretched clusters. That tends to keep merged groups relatively compact and well separated.

5. What does the merge history table show?

It lists each clustering step, the two clusters merged, the merge distance, the resulting cluster, and how many clusters remain afterward.

6. How do I interpret the merge plot?

Large jumps in merge distance often suggest natural separation. A sharp increase can indicate that merging beyond that point forces dissimilar clusters together.

7. Can I use more than two variables?

Yes. Each row can contain a label followed by any consistent number of numeric variables. Every row must have the same dimensionality.

8. What do the CSV and PDF exports include?

They include the merge history summary. This makes it easy to share clustering steps, review merge distances, and keep a record of the analysis.