Data types, measures of central tendency and spread, quartiles, variance and standard deviation, stem-and-leaf diagrams, box-and-whisker plots, probability rules, conditional probability, tree diagrams, and Venn diagrams.
Section 4 contributes 5 items to Paper 01 and one 20-mark question to Paper 02. The Paper 02 question typically combines statistical calculation with interpretation and a probability problem using diagrams.
Qualitative (categorical) data describes attributes without numerical value: eye colour, nationality, type of vehicle.
Quantitative data is numerical and divides into two types:
| Measure | Definition | Best used when |
|---|---|---|
| Mean | Sum of all values divided by the count | Data is symmetric with no extreme outliers |
| Median | Middle value when data is ordered | Data is skewed or has outliers |
| Mode | Most frequent value | Identifying the most common category |
The simplest measure of spread is the range: . It is easy to compute but sensitive to a single extreme value.
Ordered data is divided into four equal parts by the quartiles , (median), and .
The IQR measures the spread of the middle 50% of the data. Unlike the range, it is not distorted by a single extreme value.
Data (already ordered): .
Median (): 5th value .
Lower half: .
Upper half: .
The th percentile is the value below which of the data lies. Quartiles are specific percentiles: , , .
Variance measures average squared distance from the mean. Standard deviation is its square root, restoring the original units.
A small standard deviation means values cluster closely around the mean (consistent). A large standard deviation means values are widely spread (variable).
Two datasets can have the same mean but very different standard deviations. When comparing distributions, state both the measure of centre and the measure of spread. "Group A has a higher mean but Group B is more consistent because its standard deviation is smaller."
A stem-and-leaf diagram keeps the original data values while showing the distribution shape.
Constructing one:
Data:
1 | 7 9
2 | 2
3 | 1 3 8
4 | 4 7 9
Key: 1 | 7 means 17
A back-to-back stem-and-leaf diagram places two datasets side by side sharing a common stem, allowing direct visual comparison.
| Advantages of stem-and-leaf | Disadvantages |
|---|---|
| Original values are preserved | Impractical for large datasets |
| Shape of distribution is visible | Difficult to compare non-overlapping ranges |
A box plot summarises a dataset using five values: minimum, , median, , maximum.
Min Q₁ Q₂ Q₃ Max
|------[======|======]------|
The box spans from to and contains the middle 50% of the data. The whiskers extend to the minimum and maximum.
| Pattern | Skew |
|---|---|
| and whiskers roughly equal | Symmetric |
| or right whisker longer | Positive skew (tail to right) |
| or left whisker longer | Negative skew (tail to left) |
For a positively skewed distribution: Mode Median Mean. For a negatively skewed distribution: Mean Median Mode.
An experiment produces outcomes. The set of all possible outcomes is the sample space . An event is a subset of the sample space.
This is classical probability, valid when all outcomes are equally likely.
Relative frequency gives an experimental estimate of probability when theoretical equally-likely outcomes cannot be assumed:
As the number of trials increases, the relative frequency approaches the true probability.
For mutually exclusive events ( and cannot both occur): , so:
A card is drawn from a standard deck. and where = King and = Ace. Since these are mutually exclusive:
The conditional probability of given has occurred:
Rearranging: .
Events and are independent if knowing that occurred gives no information about :
Do not confuse "mutually exclusive" with "independent." Mutually exclusive events () cannot both occur. Independent events can both occur, but neither influences the other's probability. Two events with non-zero probabilities cannot be both mutually exclusive and independent.
A possibility space (sample space) diagram lists all outcomes in a grid. Useful for two-stage experiments like rolling two dice.
Two fair dice are rolled. The sample space has equally likely outcomes.
(the six pairs that sum to 7: ).
Tree diagrams show sequential outcomes. Branches show each possible outcome at each stage; multiply along branches for intersection probabilities.
The syllabus restricts tree diagrams to two initial branches.
A bag contains 3 red and 5 blue balls. A ball is drawn, its colour noted, then a second ball is drawn without replacement.
P(both red)
P(one of each)
Venn diagrams with two sets partition the sample space into four regions: only, only, , and neither. The sum of all regions must equal (or for counts).
The syllabus restricts Venn diagrams to two sets.
In a class of 30 students, 18 study Physics, 15 study Chemistry, and 8 study both.
P(Physics only)
P(at least one)
P(neither)