Siegel Tukey Test Calculator

Calculator Input

Sample A Values

Use commas, spaces, or new lines between values.

Sample B Values

Enter a second independent sample for comparison.

Confidence Level

Example Data Table

Observation	Sample A	Sample B
1	12	9
2	15	14
3	16	17
4	18	19
5	20	28
6	22	31

This example contrasts a tighter first sample against a more dispersed second sample, making the variability comparison easy to interpret.

Formula Used

The Siegel-Tukey test converts pooled observations into ranks that alternate from the smallest extreme to the largest extreme, then toward the center. These special ranks emphasize spread rather than location.

Step 1: Combine both samples and sort ascending.

Step 2: Assign ranks in this order: smallest, largest, second smallest, second largest, and so on.

Step 3: Average ranks for ties when duplicate values occur.

Step 4: Sum the assigned ranks for one sample, then compute the Mann-Whitney style statistic on the transformed ranks:

U = W - n(n + 1) / 2

Here, W is the rank sum for the chosen sample and n is that sample size. A normal approximation is then used for larger samples:

Z = (W - μ_W - c) / √Var(W)

where μ_W = n₁(N + 1) / 2, tie adjustments refine the variance, and c is the continuity correction.

Why the Test Targets Spread

The Siegel-Tukey procedure evaluates whether two independent samples differ in variability without assuming normality. It reorders pooled observations so the smallest and largest values receive early ranks, followed by the next smallest and next largest. This ranking pattern makes the test sensitive to dispersion rather than central tendency. Analysts use it when distributions are skewed, contain outliers, or where spread matters more than average level.

When Ranked Extremes Add Value

Traditional variance comparisons can become unstable when data depart from bell-shaped behavior. By ranking extremes first, the method reduces reliance on assumptions and supports robust decisions in quality monitoring, teaching experiments, environmental measurements, and operational benchmarking. If one sample contributes many extreme observations, its transformed rank sum shifts accordingly. That movement provides evidence that one group is more dispersed, even when medians remain similar.

Reading the Main Outputs

This calculator reports sample size, median, mean, standard deviation, transformed rank sums, the U statistic, the normal-approximation Z value, and the p value. A small p value indicates that the observed difference in spread is unlikely under the null hypothesis of equal variability. The interpretation statement summarizes which sample appears more variable based on the direction of the transformed ranks.

Example Pattern in the Table

In the included example, Sample A ranges from 12 to 22, while Sample B ranges from 9 to 31. The wider range in Sample B suggests greater variability before testing begins. After pooled ordering and alternating extreme ranks, the second sample collects more evidence from the tails. This rank structure helps explain why a larger spread can be detected even with modest sample sizes.

How Analysts Use the Result

A statistically significant result supports follow-up decisions such as tightening process controls, segmenting populations, or reviewing measurement consistency. In academic settings, the test helps compare laboratory precision, classroom score consistency, or simulation stability. In business analysis, it can flag whether one supplier, machine, or channel produces outcomes with less predictable spread, which matters for planning.

Good Practice for Interpretation

Always inspect the raw values alongside the test result. Very small samples, many ties, or obvious dependence between observations can limit reliability. It is wise to review visual evidence, such as the chart included here, and compare summary statistics before drawing conclusions. The strongest workflow combines numerical output, ranked evidence, and subject-matter context so variability decisions remain statistically sound and operationally useful.

Frequently Asked Questions

1. What does the Siegel-Tukey test measure?

It compares the variability of two independent samples. The procedure emphasizes extreme values through special ranks, making it useful when spread differences matter more than mean differences.

2. When should I use this test instead of an F test?

Use it when data are non-normal, skewed, or contain outliers. It is a rank-based alternative that avoids strict variance-comparison assumptions tied to normal distributions.

3. Can the calculator handle tied observations?

Yes. When duplicate values occur, the calculator averages the assigned transformed ranks and applies a tie correction in the variance used for the Z approximation.

4. What does a small p value mean here?

A small p value suggests the two samples likely do not share the same variability. The interpretation also indicates which sample appears more dispersed.

5. Does the test compare sample means?

No. The transformed ranking scheme is designed to detect differences in spread, not to test whether the samples have different averages or medians.

6. Why is the chart useful alongside the numeric output?

The chart reveals where extreme observations sit after ranking. It helps you visually confirm whether one sample contributes more tail behavior and broader dispersion.