Cluster Analysis Tool Calculator

Enter Dataset and Settings

Dataset

Use numeric feature columns. Optional row labels can appear in the first column. This tool works best with comma, semicolon, or tab separated data.

Number of Clusters

Distance Metric

Maximum Iterations

Random Starts

Random Seed

Decimal Places

Standardize Features

Convert features to z-scores before clustering

First Row Contains Headers

Use the first row as column names

First Column Contains Labels

Use the first column as observation names

Example Data Table

This sample shows how a labeled numeric dataset should look before analysis.

Segment	Feature_A	Feature_B	Feature_C
A1	2.1	1.9	2.3
A2	2.4	2.2	2.0
A3	1.8	2.0	2.2
B1	7.4	7.1	6.9
B2	7.8	7.5	7.2
B3	7.1	7.3	7.0
C1	4.5	8.2	5.9
C2	4.9	8.5	6.2
C3	4.3	7.9	5.7

Formula Used

1. Standardization

When standardization is enabled, each value is converted using:

z = (x - mean) / standard deviation

2. Distance Calculation

Euclidean distance:

d = √Σ(xi - ci)²

Manhattan distance:

d = Σ|xi - ci|

Chebyshev distance:

d = max|xi - ci|

3. Centroid Update

For Euclidean distance, each centroid component is the mean of all assigned observations:

centroid_j = Σxj / n

For Manhattan and Chebyshev modes, the tool uses a median-based update for stability.

4. Total Cluster Error

Euclidean mode uses squared distances to estimate within-cluster compactness. Other modes sum the selected distance values for each assigned observation.

5. Silhouette Score

s(i) = (b(i) - a(i)) / max(a(i), b(i))

Here, a(i) is the average distance to the current cluster, and b(i) is the smallest average distance to another cluster.

How to Use This Calculator

Paste your dataset into the input area.
Keep row labels in the first column if you want named observations.
Set the number of clusters you want to test.
Choose a distance rule that matches your analysis goal.
Enable standardization when features have very different scales.
Increase random starts for a more stable best solution.
Click Run Cluster Analysis to generate assignments, centroids, and the graph.
Download the results as CSV or PDF after the analysis appears.

FAQs

1. What kind of data should I enter?

Use rows of numeric observations. The first column may hold labels, and the first row may hold headers. Avoid text inside feature columns because clustering calculations require numeric values.

2. When should I standardize my data?

Standardize when one feature has much larger values than others. This prevents large-scale variables from dominating distance calculations and often improves balanced grouping.

3. How do I choose the number of clusters?

Start with a reasonable guess based on domain knowledge. Compare total error, silhouette score, and visual separation across several values of k to find a practical balance.

4. What does the silhouette score mean?

A higher silhouette score usually indicates better separation and tighter grouping. Values near one are strong, around zero are mixed, and negative values suggest overlap or poor assignments.

5. Why does the tool use multiple random starts?

Different starting centroids can lead to different solutions. Multiple starts reduce the risk of keeping a weak local result and improve the chance of finding a better cluster arrangement.

6. What does total cluster error show?

It measures how tightly observations sit around their assigned centroids. Lower values usually indicate more compact groups, though they should be interpreted together with silhouette score and domain logic.

7. Why is my graph based on only two features?

A two-dimensional plot is easier to read on a webpage. When your dataset contains more than two features, the chart displays the first two columns while the clustering still uses all columns.

8. Can I use this for market, customer, or survey segmentation?

Yes. This tool is useful for many segmentation tasks involving numeric variables, including customer profiles, product behavior, biological measurements, quality control, and statistical grouping exercises.

Segment	Feature_A	Feature_B	Feature_C
A1	2.1	1.9	2.3
A2	2.4	2.2	2.0
A3	1.8	2.0	2.2
B1	7.4	7.1	6.9
B2	7.8	7.5	7.2
B3	7.1	7.3	7.0
C1	4.5	8.2	5.9
C2	4.9	8.5	6.2
C3	4.3	7.9	5.7

Segment	Feature_A	Feature_B	Feature_C
A1	2.1	1.9	2.3
A2	2.4	2.2	2.0
A3	1.8	2.0	2.2
B1	7.4	7.1	6.9
B2	7.8	7.5	7.2
B3	7.1	7.3	7.0
C1	4.5	8.2	5.9
C2	4.9	8.5	6.2
C3	4.3	7.9	5.7