Formula Used
Euclidean distance: d(i,j) = sqrt(Σ(xᵢ - xⱼ)²)
Manhattan distance: d(i,j) = Σ|xᵢ - xⱼ|
Cosine distance: d(i,j) = 1 - (xᵢ · xⱼ) / (||xᵢ|| ||xⱼ||)
Cluster center: c = Σx / n
Within cluster SSE: SSE = Σ d(xᵢ, c)²
Silhouette score: s = (b - a) / max(a, b)
Dunn index: D = minimum intercluster distance / maximum cluster diameter
DCA split rule: the calculator starts with one cluster, selects the least compact cluster, finds a farthest seed pair, and divides points by nearest seed centers.
How To Use This Calculator
- Paste your dataset into the input box.
- Use one observation per line.
- Add labels before or after numeric values if needed.
- Select the number of final clusters.
- Choose a distance metric for comparison.
- Select scaling when columns use different units.
- Press the calculate button.
- Review the chart, score cards, cluster table, and split history.
- Download the CSV or PDF report for records.
Example Data Table
| Label |
Feature X |
Feature Y |
Expected Pattern |
| A |
1.0 |
1.2 |
Low left group |
| D |
5.2 |
4.9 |
Middle group |
| G |
9.1 |
1.1 |
Right lower group |
| J |
3.4 |
7.8 |
Upper group |
What This DCA Calculator Does
Divisive clustering starts with one large group. It then separates that group into smaller parts. This calculator follows that top down idea. It reads numeric observations from your data box. It scales values when needed. It measures distances. It finds the cluster that should split next. The selected group is divided into two cleaner groups. The process continues until the target cluster count is reached.
Why Divisive Clustering Helps
DCA is useful when you want a broad view first. Many clustering tools build groups upward. Divisive analysis moves in the opposite direction. It starts with every point together. Then it asks which split reduces disorder most. This view can make patterns easier to explain. It is helpful for math practice, segmentation, quality review, and exploratory data analysis.
Reading the Results
The summary cards show total points, dimensions, clusters, SSE, silhouette, Dunn score, and Davies Bouldin score. Lower SSE usually means tighter clusters. A higher silhouette score means points fit their assigned groups better. A higher Dunn score suggests wider separation and lower spread. A lower Davies Bouldin score usually means better cluster balance.
Choosing Good Settings
Use Euclidean distance for normal geometric data. Use Manhattan distance when movement is grid based. Use cosine distance when direction matters more than size. Choose z score scaling when columns use different units. Choose min max scaling when you need values in a shared range. Keep the target cluster count reasonable. Too many clusters can hide the larger pattern.
Practical Notes
DCA is exploratory. It does not prove a final truth by itself. Check the chart. Review the assignments. Compare scores across several cluster counts. Look for stable groups and clear distances. Also inspect outliers. A single far point can force an early split. Clean labels and consistent columns make the output more reliable.
Best Use Cases
Use this page for small and medium datasets. It works well for classroom examples, customer groups, measurement sets, and trial feature tables. For very large files, sample the data first. Then test the same settings on the full system. Save the CSV report for audits. Save the PDF view for simple sharing.
FAQs
What is DCA clustering?
DCA means divisive clustering analysis. It starts with all data points in one group. Then it repeatedly splits the least compact group until the selected cluster count is reached.
When should I use this calculator?
Use it when you want to explore natural groups in numeric data. It is useful for learning, segmentation, pattern review, and quick cluster comparisons.
What data format is accepted?
Enter one point per line. You can write labels with values, such as A, 2.4, 5.1. Every line must have the same number of numeric columns.
Which distance metric is best?
Euclidean works well for geometric data. Manhattan works well for grid movement. Cosine is useful when direction matters more than magnitude.
Should I scale the data?
Scale the data when columns use different units or ranges. Z score scaling is common. Min max scaling keeps values within a shared range.
What does SSE mean?
SSE means sum of squared errors. It measures how far points are from their cluster center. Lower values usually mean tighter clusters.
What does silhouette score show?
Silhouette compares within cluster closeness against nearby cluster distance. Higher values usually mean better assignment and clearer separation.
Can I export the result?
Yes. Use the CSV button for spreadsheet work. Use the PDF button for a simple printable summary of scores and assignments.