Statistics Guide

Basic Statistics Guide

Descriptive statistics turn a list of values into a compact explanation. Use center measures to describe typical values, spread measures to describe variation, percentiles to describe rank, and z-scores to standardize distance from the mean.

Example dataset

2, 4, 4, 6, 9

Mean is 5, median is 4, repeated mode is 4, and range is 7.

Formula Table

Core descriptive statistics formulas

MeasureFormula or ruleBest use
Meansum of values / nAverages when every value should influence the result.
Medianmiddle sorted value, or average of two middle valuesCenter when outliers or skew may pull the mean.
Modemost frequent valueRepeated categories, scores, or common observations.
Sample variancesum((x - xbar)^2) / (n - 1)Spread estimate from a sample dataset.
Population variancesum((x - mu)^2) / NSpread when the dataset is the complete population.
Standard deviationsquare root of varianceSpread in the same unit as the original values.
Z-score(value - mean) / standard deviationUnitless distance from the mean.

Center

Mean, median, and mode describe what is typical, but they react differently to skew and repeated values.

Spread

Range, variance, and standard deviation describe how tightly or widely values cluster around the center.

Rank

Percentiles place a value on an ordered scale, so quartiles and high-percentile thresholds can be compared.

Standardization

Z-scores express raw distance in standard deviations, making values easier to compare across scales.

Worked Example

How the numbers change by method

For the dataset 2, 4, 4, 6, 9, the sum is 25 and n is 5, so the mean is 5. The sorted list is already 2, 4, 4, 6, 9, so the median is the middle value, 4. The value 4 appears twice, so it is the repeated mode. The minimum is 2, the maximum is 9, and the range is 7.

Spread starts by comparing every value with the mean. The squared deviations are 9, 1, 1, 1, and 16, for a total of 28. Population variance divides 28 by 5, giving 5.6. Sample variance divides 28 by 4, giving 7. The corresponding standard deviations are the square roots of those variances, so the sample standard deviation is larger because it estimates a broader population from a small sample.

Percentiles and z-scores answer different questions. A percentile describes position inside the sorted dataset. A z-score describes how many standard deviations a raw value is above or below the mean. Use actual dataset percentiles when rank in the observed values matters, and use z-scores when a normal-distribution model is appropriate.

Source Notes

References used for formula context

OpenStax Introductory Statistics

OpenStax covers measures of center, spread, percentiles, and z-score notation in its introductory statistics materials.

Read OpenStax section

NIST statistics handbook

The NIST/SEMATECH Engineering Statistics Handbook provides applied statistics references for scientists and engineers.

Open NIST handbook

Statistics Calculators

Check the formulas with your own dataset

Open calculators