Precalculus / Statistics

Statistics & Probability

Descriptive statistics, probability rules, conditional probability and Bayes' theorem, counting principles, binomial and normal distributions, and expected value — the complete guide.

Measures of Center

Measures of center summarize a dataset with a single representative value. The three main measures are mean, median, and mode.

Mean (Average)

μ = (Σxᵢ) / n

Sum all values and divide by the count. Sensitive to outliers — a single extreme value can pull the mean far from the center of most data.

Median

Middle value (sorted)

Sort the data. For odd n, the median is the middle value. For even n, average the two middle values. Resistant to outliers — preferred for skewed data like incomes.

Mode

Most frequent value

The value that appears most often. A dataset can be unimodal, bimodal, or have no mode. Essential for categorical data where averaging is meaningless.

Quick Example: Dataset {2, 4, 4, 6, 8, 8, 8, 10}

Mean

(2+4+4+6+8+8+8+10)/8

= 50/8 = 6.25

Median (n=8)

Middle two: 6 and 8

= (6+8)/2 = 7

Mode

8 appears 3 times

= 8

Measures of Spread

Measures of spread (dispersion) describe how scattered data values are around the center.

Range and Interquartile Range (IQR)

Range

Range = Max − Min

Simple but very sensitive to outliers — a single extreme value changes it dramatically.

IQR

IQR = Q3 − Q1

The range of the middle 50% of data. Resistant to outliers. Q1 is the 25th percentile; Q3 is the 75th percentile.

Variance and Standard Deviation

Population (entire dataset)

σ² = Σ(xᵢ − μ)² / n
σ = √[Σ(xᵢ − μ)² / n]

Sample (subset of a population)

s² = Σ(xᵢ − x̄)² / (n − 1)
s = √[Σ(xᵢ − x̄)² / (n − 1)]

Why n − 1 for samples? Using n − 1 (Bessel's correction) prevents underestimating the true population variance. A sample tends to have values closer to its own mean than the population mean, so dividing by n − 1 compensates.

Core Probability Rules

All probabilities are between 0 and 1. P(certain event) = 1. P(impossible event) = 0. The complement rule: P(Aᶜ) = 1 − P(A).

Addition Rule

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

For mutually exclusive events: P(A ∩ B) = 0, so P(A ∪ B) = P(A) + P(B)

P(King or Heart) = 4/52 + 13/52 − 1/52 = 16/52 ≈ 0.308

Multiplication Rule (Independent)

P(A ∩ B) = P(A) × P(B)

Only valid when A and B are independent (one outcome does not affect the other)

P(two heads) = P(H) × P(H) = 0.5 × 0.5 = 0.25

Multiplication Rule (Dependent)

P(A ∩ B) = P(A) × P(B | A)

P(B | A) is the conditional probability of B given A has occurred

P(2 Aces, no replacement) = (4/52) × (3/51) ≈ 0.0045

Complement Rule

P(Aᶜ) = 1 − P(A)

If it's easier to find the probability of the opposite event, use the complement

P(at least one head in 4 flips) = 1 − P(all tails) = 1 − (0.5)⁴ = 0.9375

Conditional Probability & Bayes' Theorem

Conditional Probability

P(B | A) is the probability of B given that A has already occurred. It restricts the sample space to only outcomes where A is true.

P(B | A) = P(A ∩ B) / P(A)

Example: A card is drawn from a standard deck. Given that it's a face card, what's the probability it's a King? There are 12 face cards and 4 Kings. P(King | Face) = 4/12 = 1/3.

Bayes' Theorem

Bayes' theorem reverses the conditioning — it lets you find P(A | B) when you know P(B | A). Essential for diagnostic testing, spam filtering, and machine learning.

P(A | B) = [P(B | A) × P(A)] / P(B)
P(B) = P(B|A)·P(A) + P(B|Aᶜ)·P(Aᶜ) ← total probability rule

Classic medical test example:

Disease prevalence: P(D) = 0.01

Test sensitivity P(+ | D) = 0.99

False positive rate P(+ | Dᶜ) = 0.05

P(B) = 0.99 × 0.01 + 0.05 × 0.99 = 0.0099 + 0.0495 = 0.0594

P(D | +) = (0.99 × 0.01) / 0.0594 ≈ 0.167

Even with a positive test, there is only a ~17% chance of having the disease — because the disease is rare.

Counting Principles

Fundamental Counting Principle

If event A can occur in m ways and event B can occur in n ways, then A followed by B can occur in m × n ways.

3 shirts × 4 pants × 2 shoes = 24 different outfits

Permutations (Order Matters)

P(n, r) = n! / (n − r)!

Count ordered arrangements of r items chosen from n distinct items.

Arrange 3 of 5 books: P(5,3) = 5!/(5−3)! = 60

Combinations (Order Doesn't Matter)

C(n, r) = n! / [r!(n − r)!]

Count unordered subsets of r items chosen from n distinct items. Also written ⁿCᵣ or (n choose r).

Choose 3 of 5 for a team: C(5,3) = 10

Permutations vs. Combinations: Quick Reference

ScenarioOrder?FormulaAnswer
Top-3 finishers from 8 runnersYesP(8,3)336
5-card hand from 52-card deckNoC(52,5)2,598,960
4-digit PIN (no repeats)YesP(10,4)5,040
Committee of 4 from 10 peopleNoC(10,4)210

Discrete Probability Distributions

Binomial Distribution

Models the number of successes in n independent trials, each with success probability p.

Conditions (BINS)

  • Binary outcomes (success/failure)
  • Independent trials
  • Number of trials fixed
  • Same probability p each trial

Key Formulas

P(X=k) = C(n,k) · pᵏ · (1−p)ⁿ⁻ᵏ
μ = np
σ = √(np(1−p))

Example: P(exactly 3 heads in 6 coin flips)

n=6, k=3, p=0.5

P(X=3) = C(6,3) · (0.5)³ · (0.5)³ = 20 · 0.125 · 0.125

= 20 × 0.015625 = 0.3125

Geometric Distribution

Models the number of trials needed to achieve the first success, where each trial has success probability p.

Key Formulas

P(X=k) = (1−p)ᵏ⁻¹ · p
μ = 1/p
σ = √((1−p)/p²)

Example

A basketball player makes free throws with p = 0.70. What is the probability the first miss occurs on the 4th shot?

P(X=4) = (0.7)³ · 0.3 ≈ 0.103

Normal Distribution & Z-Scores

Properties of the Normal Distribution

  • Bell-shaped and symmetric about the mean
  • Mean = Median = Mode (all equal)
  • Defined by two parameters: μ (mean) and σ (standard deviation)
  • Total area under the curve = 1
  • Tails approach but never touch the x-axis

68-95-99.7 Empirical Rule

68% of data within μ ± 1σ
95% of data within μ ± 2σ
99.7% of data within μ ± 3σ

Z-Scores (Standardization)

A z-score converts any normal distribution value to the standard normal distribution (μ = 0, σ = 1), enabling the use of z-tables.

z = (x − μ) / σ

z = 0

x equals the mean

z = 1

x is 1 std dev above mean

z = −2

x is 2 std devs below mean

Example: SAT scores μ = 1000, σ = 200

Score of 1300: z = (1300 − 1000) / 200 = 1.5

Score of 800: z = (800 − 1000) / 200 = −1.0

A z of 1.5 means the score beats about 93.3% of test-takers

Common Z-Score Benchmarks

Z-ScoreArea to LeftInterpretation
z = −2.000.02282.28% below
z = −1.000.158715.87% below
z = 0.000.500050% below (median)
z = 1.000.841384.13% below
z = 1.6450.950095% below (top 5%)
z = 1.9600.975097.5% below (top 2.5%)
z = 2.000.977297.72% below

Expected Value

Expected value E(X) is the theoretical long-run average of a random variable — what you would expect on average across many trials.

Discrete Expected Value Formula

E(X) = Σ [xᵢ · P(xᵢ)] = x₁P(x₁) + x₂P(x₂) + ⋯ + xₙP(xₙ)

Properties

  • E(aX + b) = a·E(X) + b
  • E(X + Y) = E(X) + E(Y)
  • E(c) = c for any constant c
  • Does not need to be a possible value

Decision Rule

E(X) > 0 — favorable, take it

E(X) = 0 — fair game (break even)

E(X) < 0 — unfavorable, avoid it

Expected Value: Probability Distribution Table

Game: Roll a die. Win $5 for a 6, win $1 for a 4 or 5, lose $2 for 1, 2, or 3.

Outcome (x)P(x)x · P(x)
$5 (roll a 6)1/6$5 × 1/6 = $0.833
$1 (roll 4 or 5)2/6$1 × 2/6 = $0.333
−$2 (roll 1, 2, or 3)3/6−$2 × 3/6 = −$1.000
E(X)Σ = 1$0.833 + $0.333 − $1.000 = $0.167

E(X) ≈ $0.17 per game — the game slightly favors the player. Over 100 games, expect to profit about $17.

Worked Examples

Example 1 — Find Mean, Median, Mode, and Standard Deviation

Dataset: 3, 7, 7, 9, 12, 12, 12, 15, 18

Mean = (3+7+7+9+12+12+12+15+18)/9 = 95/9 ≈ 10.56

Median: n=9, middle is 5th value → sorted: 3,7,7,9,12,12,12,15,18 → 12

Mode: 12 appears 3 times → Mode = 12

Variance: Σ(xᵢ − 10.56)² / 9

= [(3−10.56)²+(7−10.56)²+…+(18−10.56)²] / 9 ≈ 18.47

σ = √18.47 ≈ 4.30

Example 2 — Binomial Probability

A student guesses randomly on a 10-question true/false quiz (p = 0.5).

Find: P(exactly 7 correct)

n = 10, k = 7, p = 0.5

P(X=7) = C(10,7) × (0.5)⁷ × (0.5)³

= 120 × (0.5)¹⁰ = 120 / 1024

≈ 0.117 (about 11.7%)

μ = np = 10 × 0.5 = 5  |  σ = √(npq) = √2.5 ≈ 1.58

Example 3 — Normal Distribution with Z-Scores

IQ scores are normally distributed: μ = 100, σ = 15.

What percent of people have IQ between 85 and 115?

z₁ = (85 − 100) / 15 = −1.00

z₂ = (115 − 100) / 15 = +1.00

P(−1 < z < 1) = P(z < 1) − P(z < −1)

= 0.8413 − 0.1587 = 0.6826

About 68.26% of people have IQ between 85 and 115 (the 68% rule)

Example 4 — Conditional Probability with a Two-Way Table

Survey of 200 students: sport preference × grade level

SoccerBasketballTotal
Junior453580
Senior5565120
Total100100200

P(Soccer | Junior) = 45/80 = 0.5625

P(Junior | Soccer) = 45/100 = 0.45

Note: P(A|B) ≠ P(B|A) — order matters in conditional probability!

Example 5 — Expected Value: Insurance Decision

An insurance policy costs $300/year.

P(major claim of $10,000) = 0.02

P(minor claim of $1,000) = 0.08

P(no claim) = 0.90

Expected payout (from company's perspective):

E(payout) = 10000(0.02) + 1000(0.08) + 0(0.90)

= 200 + 80 + 0 = $280

Company collects $300, expects to pay $280 → E(profit) = $20/policy

For the buyer: expected benefit $280 < cost $300 → expected net = −$20. Insurance still makes sense for risk aversion — you pay $20 to avoid a potential $10,000 loss.

Frequently Asked Questions

What is the difference between mean, median, and mode?

Mean is the arithmetic average — add all values and divide by the count. Median is the middle value when data is sorted; for an even number of values, average the two middle values. Mode is the value that appears most frequently; a dataset can have no mode, one mode, or multiple modes. Use mean for symmetric distributions without outliers. Use median when data is skewed or contains outliers, because the median is resistant to extreme values. Use mode for categorical data or when the most common value matters.

How do you calculate standard deviation?

Standard deviation measures how spread out values are from the mean. Steps: (1) Find the mean μ. (2) Subtract the mean from each data value and square the result: (xᵢ − μ)². (3) Average those squared differences — this is the variance σ². For a population use n in the denominator; for a sample use n − 1. (4) Take the square root of the variance to get the standard deviation σ. A small standard deviation means data clusters tightly around the mean; a large value means data is widely spread.

What is the addition rule of probability?

The general addition rule states: P(A or B) = P(A) + P(B) − P(A and B). You subtract P(A and B) to avoid counting the overlap twice. For mutually exclusive events (events that cannot both occur), P(A and B) = 0, so the rule simplifies to P(A or B) = P(A) + P(B). Example: P(King or Heart) = 4/52 + 13/52 − 1/52 = 16/52. You subtract 1/52 for the King of Hearts, which was counted in both groups.

What is the multiplication rule for independent events?

For independent events (where the outcome of one does not affect the other), P(A and B) = P(A) × P(B). For dependent events, the general multiplication rule uses conditional probability: P(A and B) = P(A) × P(B | A), where P(B | A) is the probability of B given that A has already occurred. Example independent: flipping heads twice — P(H and H) = 0.5 × 0.5 = 0.25. Example dependent: drawing two aces without replacement — P(Ace then Ace) = (4/52) × (3/51) ≈ 0.0045.

How does Bayes' theorem work?

Bayes' theorem updates a probability when new evidence arrives: P(A | B) = [P(B | A) × P(A)] / P(B). In words: the probability of A given B equals the probability of B given A, times the prior probability of A, divided by the overall probability of B. P(B) is often expanded using the total probability rule: P(B) = P(B | A) × P(A) + P(B | Aᶜ) × P(Aᶜ). Classic use: a medical test has 99% sensitivity and 95% specificity for a disease with 1% prevalence. Even with a positive test, you can use Bayes' theorem to find the true probability a patient has the disease.

What is the difference between permutations and combinations?

Permutations count arrangements where order matters: P(n, r) = n! / (n − r)!. Combinations count selections where order does not matter: C(n, r) = n! / [r! × (n − r)!]. Memory trick: 'Permutations Pick Positions' — the order of seats matters. 'Combinations are Clumps' — the order in a group doesn't matter. Example: choosing 3 from 5 people for a committee (order doesn't matter) = C(5,3) = 10. Arranging 3 of 5 people in first/second/third place (order matters) = P(5,3) = 60.

What is the binomial distribution formula?

The binomial distribution applies when you have n independent trials, each with success probability p, and you want exactly k successes. The formula is P(X = k) = C(n, k) × pᵏ × (1 − p)ⁿ⁻ᵏ. The mean is μ = np and the standard deviation is σ = √(np(1 − p)). Conditions for a binomial experiment: fixed number of trials n, each trial is independent, each trial has only two outcomes (success or failure), and p is constant across trials. Example: probability of exactly 3 heads in 5 coin flips = C(5,3) × (0.5)³ × (0.5)² = 10 × 0.125 × 0.25 = 0.3125.

How do you use z-scores with the normal distribution?

A z-score tells how many standard deviations a value x is from the mean: z = (x − μ) / σ. A z-score of 0 means x equals the mean; z = 1 means x is one standard deviation above the mean; z = −2 means two standard deviations below. Once you have the z-score, use a z-table (or the 68-95-99.7 rule) to find probabilities. The 68-95-99.7 rule: about 68% of data falls within 1σ of the mean, 95% within 2σ, and 99.7% within 3σ. Example: SAT scores have μ = 1000, σ = 200. For x = 1300, z = (1300 − 1000)/200 = 1.5, meaning the score is 1.5 standard deviations above average.

What is expected value and how do you calculate it?

Expected value E(X) is the long-run average outcome of a random variable. For a discrete distribution: E(X) = Σ [xᵢ × P(xᵢ)], the sum of each value multiplied by its probability. If E(X) > 0 in a gambling context, the game favors you; if E(X) < 0, it favors the house. Example: a lottery ticket costs $2 and pays $10 with probability 0.1 and $0 with probability 0.9. E(payout) = 10(0.1) + 0(0.9) = $1.00. Net expected value = $1.00 − $2.00 = −$1.00 per ticket. On average you lose $1 per ticket.

Related Topics

Practice Statistics & Probability

Interactive problems with step-by-step solutions and private tutoring — free to try.

Start Practicing Free