Descriptive statistics, probability rules, conditional probability and Bayes' theorem, counting principles, binomial and normal distributions, and expected value — the complete guide.
Measures of center summarize a dataset with a single representative value. The three main measures are mean, median, and mode.
μ = (Σxᵢ) / n
Sum all values and divide by the count. Sensitive to outliers — a single extreme value can pull the mean far from the center of most data.
Middle value (sorted)
Sort the data. For odd n, the median is the middle value. For even n, average the two middle values. Resistant to outliers — preferred for skewed data like incomes.
Most frequent value
The value that appears most often. A dataset can be unimodal, bimodal, or have no mode. Essential for categorical data where averaging is meaningless.
Mean
(2+4+4+6+8+8+8+10)/8
= 50/8 = 6.25
Median (n=8)
Middle two: 6 and 8
= (6+8)/2 = 7
Mode
8 appears 3 times
= 8
Measures of spread (dispersion) describe how scattered data values are around the center.
Range
Simple but very sensitive to outliers — a single extreme value changes it dramatically.
IQR
The range of the middle 50% of data. Resistant to outliers. Q1 is the 25th percentile; Q3 is the 75th percentile.
Population (entire dataset)
Sample (subset of a population)
Why n − 1 for samples? Using n − 1 (Bessel's correction) prevents underestimating the true population variance. A sample tends to have values closer to its own mean than the population mean, so dividing by n − 1 compensates.
All probabilities are between 0 and 1. P(certain event) = 1. P(impossible event) = 0. The complement rule: P(Aᶜ) = 1 − P(A).
For mutually exclusive events: P(A ∩ B) = 0, so P(A ∪ B) = P(A) + P(B)
P(King or Heart) = 4/52 + 13/52 − 1/52 = 16/52 ≈ 0.308
Only valid when A and B are independent (one outcome does not affect the other)
P(two heads) = P(H) × P(H) = 0.5 × 0.5 = 0.25
P(B | A) is the conditional probability of B given A has occurred
P(2 Aces, no replacement) = (4/52) × (3/51) ≈ 0.0045
If it's easier to find the probability of the opposite event, use the complement
P(at least one head in 4 flips) = 1 − P(all tails) = 1 − (0.5)⁴ = 0.9375
P(B | A) is the probability of B given that A has already occurred. It restricts the sample space to only outcomes where A is true.
Example: A card is drawn from a standard deck. Given that it's a face card, what's the probability it's a King? There are 12 face cards and 4 Kings. P(King | Face) = 4/12 = 1/3.
Bayes' theorem reverses the conditioning — it lets you find P(A | B) when you know P(B | A). Essential for diagnostic testing, spam filtering, and machine learning.
Classic medical test example:
Disease prevalence: P(D) = 0.01
Test sensitivity P(+ | D) = 0.99
False positive rate P(+ | Dᶜ) = 0.05
P(B) = 0.99 × 0.01 + 0.05 × 0.99 = 0.0099 + 0.0495 = 0.0594
P(D | +) = (0.99 × 0.01) / 0.0594 ≈ 0.167
Even with a positive test, there is only a ~17% chance of having the disease — because the disease is rare.
If event A can occur in m ways and event B can occur in n ways, then A followed by B can occur in m × n ways.
Count ordered arrangements of r items chosen from n distinct items.
Arrange 3 of 5 books: P(5,3) = 5!/(5−3)! = 60
Count unordered subsets of r items chosen from n distinct items. Also written ⁿCᵣ or (n choose r).
Choose 3 of 5 for a team: C(5,3) = 10
| Scenario | Order? | Formula | Answer |
|---|---|---|---|
| Top-3 finishers from 8 runners | Yes | P(8,3) | 336 |
| 5-card hand from 52-card deck | No | C(52,5) | 2,598,960 |
| 4-digit PIN (no repeats) | Yes | P(10,4) | 5,040 |
| Committee of 4 from 10 people | No | C(10,4) | 210 |
Models the number of successes in n independent trials, each with success probability p.
Conditions (BINS)
Key Formulas
Example: P(exactly 3 heads in 6 coin flips)
n=6, k=3, p=0.5
P(X=3) = C(6,3) · (0.5)³ · (0.5)³ = 20 · 0.125 · 0.125
= 20 × 0.015625 = 0.3125
Models the number of trials needed to achieve the first success, where each trial has success probability p.
Key Formulas
Example
A basketball player makes free throws with p = 0.70. What is the probability the first miss occurs on the 4th shot?
68-95-99.7 Empirical Rule
A z-score converts any normal distribution value to the standard normal distribution (μ = 0, σ = 1), enabling the use of z-tables.
z = 0
x equals the mean
z = 1
x is 1 std dev above mean
z = −2
x is 2 std devs below mean
Example: SAT scores μ = 1000, σ = 200
Score of 1300: z = (1300 − 1000) / 200 = 1.5
Score of 800: z = (800 − 1000) / 200 = −1.0
A z of 1.5 means the score beats about 93.3% of test-takers
| Z-Score | Area to Left | Interpretation |
|---|---|---|
| z = −2.00 | 0.0228 | 2.28% below |
| z = −1.00 | 0.1587 | 15.87% below |
| z = 0.00 | 0.5000 | 50% below (median) |
| z = 1.00 | 0.8413 | 84.13% below |
| z = 1.645 | 0.9500 | 95% below (top 5%) |
| z = 1.960 | 0.9750 | 97.5% below (top 2.5%) |
| z = 2.00 | 0.9772 | 97.72% below |
Expected value E(X) is the theoretical long-run average of a random variable — what you would expect on average across many trials.
Properties
Decision Rule
E(X) > 0 — favorable, take it
E(X) = 0 — fair game (break even)
E(X) < 0 — unfavorable, avoid it
Game: Roll a die. Win $5 for a 6, win $1 for a 4 or 5, lose $2 for 1, 2, or 3.
| Outcome (x) | P(x) | x · P(x) |
|---|---|---|
| $5 (roll a 6) | 1/6 | $5 × 1/6 = $0.833 |
| $1 (roll 4 or 5) | 2/6 | $1 × 2/6 = $0.333 |
| −$2 (roll 1, 2, or 3) | 3/6 | −$2 × 3/6 = −$1.000 |
| E(X) | Σ = 1 | $0.833 + $0.333 − $1.000 = $0.167 |
E(X) ≈ $0.17 per game — the game slightly favors the player. Over 100 games, expect to profit about $17.
Dataset: 3, 7, 7, 9, 12, 12, 12, 15, 18
Mean = (3+7+7+9+12+12+12+15+18)/9 = 95/9 ≈ 10.56
Median: n=9, middle is 5th value → sorted: 3,7,7,9,12,12,12,15,18 → 12
Mode: 12 appears 3 times → Mode = 12
Variance: Σ(xᵢ − 10.56)² / 9
= [(3−10.56)²+(7−10.56)²+…+(18−10.56)²] / 9 ≈ 18.47
σ = √18.47 ≈ 4.30
A student guesses randomly on a 10-question true/false quiz (p = 0.5).
Find: P(exactly 7 correct)
n = 10, k = 7, p = 0.5
P(X=7) = C(10,7) × (0.5)⁷ × (0.5)³
= 120 × (0.5)¹⁰ = 120 / 1024
≈ 0.117 (about 11.7%)
μ = np = 10 × 0.5 = 5 | σ = √(npq) = √2.5 ≈ 1.58
IQ scores are normally distributed: μ = 100, σ = 15.
What percent of people have IQ between 85 and 115?
z₁ = (85 − 100) / 15 = −1.00
z₂ = (115 − 100) / 15 = +1.00
P(−1 < z < 1) = P(z < 1) − P(z < −1)
= 0.8413 − 0.1587 = 0.6826
About 68.26% of people have IQ between 85 and 115 (the 68% rule)
Survey of 200 students: sport preference × grade level
| Soccer | Basketball | Total | |
|---|---|---|---|
| Junior | 45 | 35 | 80 |
| Senior | 55 | 65 | 120 |
| Total | 100 | 100 | 200 |
P(Soccer | Junior) = 45/80 = 0.5625
P(Junior | Soccer) = 45/100 = 0.45
Note: P(A|B) ≠ P(B|A) — order matters in conditional probability!
An insurance policy costs $300/year.
P(major claim of $10,000) = 0.02
P(minor claim of $1,000) = 0.08
P(no claim) = 0.90
Expected payout (from company's perspective):
E(payout) = 10000(0.02) + 1000(0.08) + 0(0.90)
= 200 + 80 + 0 = $280
Company collects $300, expects to pay $280 → E(profit) = $20/policy
For the buyer: expected benefit $280 < cost $300 → expected net = −$20. Insurance still makes sense for risk aversion — you pay $20 to avoid a potential $10,000 loss.
Mean is the arithmetic average — add all values and divide by the count. Median is the middle value when data is sorted; for an even number of values, average the two middle values. Mode is the value that appears most frequently; a dataset can have no mode, one mode, or multiple modes. Use mean for symmetric distributions without outliers. Use median when data is skewed or contains outliers, because the median is resistant to extreme values. Use mode for categorical data or when the most common value matters.
Standard deviation measures how spread out values are from the mean. Steps: (1) Find the mean μ. (2) Subtract the mean from each data value and square the result: (xᵢ − μ)². (3) Average those squared differences — this is the variance σ². For a population use n in the denominator; for a sample use n − 1. (4) Take the square root of the variance to get the standard deviation σ. A small standard deviation means data clusters tightly around the mean; a large value means data is widely spread.
The general addition rule states: P(A or B) = P(A) + P(B) − P(A and B). You subtract P(A and B) to avoid counting the overlap twice. For mutually exclusive events (events that cannot both occur), P(A and B) = 0, so the rule simplifies to P(A or B) = P(A) + P(B). Example: P(King or Heart) = 4/52 + 13/52 − 1/52 = 16/52. You subtract 1/52 for the King of Hearts, which was counted in both groups.
For independent events (where the outcome of one does not affect the other), P(A and B) = P(A) × P(B). For dependent events, the general multiplication rule uses conditional probability: P(A and B) = P(A) × P(B | A), where P(B | A) is the probability of B given that A has already occurred. Example independent: flipping heads twice — P(H and H) = 0.5 × 0.5 = 0.25. Example dependent: drawing two aces without replacement — P(Ace then Ace) = (4/52) × (3/51) ≈ 0.0045.
Bayes' theorem updates a probability when new evidence arrives: P(A | B) = [P(B | A) × P(A)] / P(B). In words: the probability of A given B equals the probability of B given A, times the prior probability of A, divided by the overall probability of B. P(B) is often expanded using the total probability rule: P(B) = P(B | A) × P(A) + P(B | Aᶜ) × P(Aᶜ). Classic use: a medical test has 99% sensitivity and 95% specificity for a disease with 1% prevalence. Even with a positive test, you can use Bayes' theorem to find the true probability a patient has the disease.
Permutations count arrangements where order matters: P(n, r) = n! / (n − r)!. Combinations count selections where order does not matter: C(n, r) = n! / [r! × (n − r)!]. Memory trick: 'Permutations Pick Positions' — the order of seats matters. 'Combinations are Clumps' — the order in a group doesn't matter. Example: choosing 3 from 5 people for a committee (order doesn't matter) = C(5,3) = 10. Arranging 3 of 5 people in first/second/third place (order matters) = P(5,3) = 60.
The binomial distribution applies when you have n independent trials, each with success probability p, and you want exactly k successes. The formula is P(X = k) = C(n, k) × pᵏ × (1 − p)ⁿ⁻ᵏ. The mean is μ = np and the standard deviation is σ = √(np(1 − p)). Conditions for a binomial experiment: fixed number of trials n, each trial is independent, each trial has only two outcomes (success or failure), and p is constant across trials. Example: probability of exactly 3 heads in 5 coin flips = C(5,3) × (0.5)³ × (0.5)² = 10 × 0.125 × 0.25 = 0.3125.
A z-score tells how many standard deviations a value x is from the mean: z = (x − μ) / σ. A z-score of 0 means x equals the mean; z = 1 means x is one standard deviation above the mean; z = −2 means two standard deviations below. Once you have the z-score, use a z-table (or the 68-95-99.7 rule) to find probabilities. The 68-95-99.7 rule: about 68% of data falls within 1σ of the mean, 95% within 2σ, and 99.7% within 3σ. Example: SAT scores have μ = 1000, σ = 200. For x = 1300, z = (1300 − 1000)/200 = 1.5, meaning the score is 1.5 standard deviations above average.
Expected value E(X) is the long-run average outcome of a random variable. For a discrete distribution: E(X) = Σ [xᵢ × P(xᵢ)], the sum of each value multiplied by its probability. If E(X) > 0 in a gambling context, the game favors you; if E(X) < 0, it favors the house. Example: a lottery ticket costs $2 and pays $10 with probability 0.1 and $0 with probability 0.9. E(payout) = 10(0.1) + 0(0.9) = $1.00. Net expected value = $1.00 − $2.00 = −$1.00 per ticket. On average you lose $1 per ticket.
Interactive problems with step-by-step solutions and private tutoring — free to try.
Start Practicing Free