Cambridge IGCSE Mathematics 0580 – Statistics (C9)
Learning Objective
Students will be able to collect, classify and represent data using appropriate tables and diagrams, to interpret the information displayed, and to calculate and compare measures of central tendency and spread. They will also describe the strength and direction of relationships between two quantitative variables and, where required, calculate a line of best fit and the correlation coefficient.
Syllabus Links (0580)
- C9.1 – Classification & tabulation (one‑way, two‑way, two‑way frequency tables).
- C9.2 – Interpretation of statistical diagrams, including recognising limitations.
- C9.3 – Averages and range (mean, median, mode, range, inter‑quartile range).
- C9.4 – Bar charts, dual/stacked bar charts, pie charts, pictograms, histograms, stem‑and‑leaf diagrams, box‑and‑whisker plots (extended).
- C9.5 – Scatter diagrams, correlation (positive, negative, zero), line of best fit (draw‑by‑eye and calculated).
- C9.6 – Standard deviation (extended).
- C9.7 – Comparative interpretation of two data sets (extended).
1. Collecting & Classifying Data
- Surveys – questionnaires, interviews.
- Experiments – controlled measurements.
- Observations – natural or systematic recording.
- Data are qualitative (categorical) or quantitative (numerical). Choose the representation that matches the data type.
2. Organising Data in Tables
2.1 One‑way tables
Used for a single variable. Include a clear header and a frequency column.
| Student | Test Score | Attendance (%) |
| A | 78 | 95 |
| B | 85 | 88 |
| C | 62 | 92 |
2.2 Two‑way tables (cross‑tabulation)
Show the relationship between two categorical variables. Percentages (row, column or overall) are often required for AO2.
| Favourite Sport |
| Gender | Football | Basketball |
| Male | 12 (60 %) | 8 (40 %) |
| Female | 5 (33 %) | 10 (67 %) |
2.3 Two‑way frequency tables (with percentages)
Useful when the question asks for “percentage of the total”.
| Favourite Sport | Total |
| Gender | Football | Basketball | |
| Male | 12 (30 %) | 8 (20 %) | 20 (50 %) |
| Female | 5 (12.5 %) | 10 (25 %) | 15 (37.5 %) |
| Total | 17 (42.5 %) | 18 (45 %) | 35 (100 %) |
3. Simple Frequency Distributions
A frequency distribution lists each distinct value (or class interval) together with its frequency.
| Number of Books Read | Frequency |
| 0–2 | 4 |
| 3–5 | 9 |
| 6–8 | 5 |
| 9–11 | 2 |
3.1 Stem‑and‑Leaf Diagrams (Core)
Compact way of showing the shape of a distribution while retaining the original data values.
Data: 42, 45, 46, 48, 51, 53, 54, 57, 59, 62, 64, 68
4 | 2 5 6 8
5 | 1 3 4 7 9
6 | 2 4 8
Stem = tens, leaf = units. The diagram can be used to read the median, mode and to construct a box‑and‑whisker plot.
4. Graphical Representations
4.1 Bar Charts (Discrete / Categorical)
- Bars are separated; height (or length) is proportional to frequency.
- Axes must be clearly labelled; the y‑axis should start at zero.
4.2 Dual / Stacked Bar Charts (Core)
Show two related sets of data side‑by‑side (dual) or on top of each other (stacked) to facilitate comparison.
4.3 Pie Charts (Parts of a Whole)
Sector angle: \(\displaystyle \theta = \frac{\text{frequency}}{\text{total}}\times360^\circ\).
| Favourite Colour | Number of Students |
| Blue | 12 |
| Red | 8 |
| Green | 5 |
| Other | 5 |
Total = 30. Angles: Blue 144°, Red 96°, Green 60°, Other 60°.
4.4 Pictograms
Each picture (icon) represents a fixed number of units.
| Fruit | Number | Pictogram (1 icon = 2 fruit) |
| Apples | 8 | 🍎🍎🍎🍎 |
| Bananas | 6 | 🍌🍌🍌 |
| Oranges | 10 | 🍊🍊🍊🍊🍊 |
4.5 Histograms (Continuous Data)
- Bars touch because the class intervals are adjacent.
- If class widths differ, use frequency density = frequency ÷ width.
| Height (cm) | Frequency |
| 150–155 | 1 |
| 155–160 | 3 |
| 160–165 | 5 |
| 165–170 | 4 |
| 170–175 | 3 |
| 175–180 | 2 |
4.6 Box‑and‑Whisker Plots (Extended)
Summarise the five‑number summary: minimum, \(Q_1\), median, \(Q_3\), maximum. Useful for comparing two data sets.
5. Scatter Diagrams & Line of Best Fit
5.1 Scatter Diagrams
Plot two quantitative variables (x‑axis, y‑axis). Look for the overall direction of the cloud of points.
- Positive correlation – points rise from left to right.
- Negative correlation – points fall from left to right.
- Zero correlation – points are randomly scattered.
5.2 Drawing a Line of Best Fit (draw‑by‑eye)
- Identify the general direction (upward, downward, flat).
- Place the line so that roughly equal numbers of points lie above and below it.
- Use the line for simple predictions.
5.3 Calculating a Line of Best Fit (Extended)
For a quick calculation use two points that appear to lie on the line.
Gradient \(m = \dfrac{y_2-y_1}{x_2-x_1}\) Intercept \(c = y_1 - m x_1\)
Equation of the line: \(\displaystyle y = mx + c\).
Example: Points (2, 5) and (6, 13)
m = (13‑5)/(6‑2) = 8/4 = 2
c = 5 – 2·2 = 1
→ y = 2x + 1
5.4 Correlation Coefficient (r) – Extended
Provides a numerical measure of the strength and direction of a linear relationship.
Formula (sample data):
\[
r = \frac{n\sum xy - (\sum x)(\sum y)}
{\sqrt{\bigl[n\sum x^{2}-(\sum x)^{2}\bigr]
\bigl[n\sum y^{2}-(\sum y)^{2}\bigr]}}
\]
Interpretation:
- \(|r| \approx 1\) – strong linear relationship.
- \(|r| \approx 0\) – weak or no linear relationship.
- Positive \(r\) → upward trend; negative \(r\) → downward trend.
6. Measures of Central Tendency
- Mean (average) – Core & Extended
\[
\bar{x}= \frac{1}{n}\sum_{i=1}^{n}x_i
\]
- Median – Core & Extended
- Order the data.
- If \(n\) is odd, median = middle value.
- If \(n\) is even, median = \(\dfrac{\text{middle two values}}{2}\).
- Mode – Core & Extended
Value(s) occurring most frequently. A set may have no mode or more than one.
7. Measures of Spread
7.1 Range
\(\displaystyle \text{Range}= \max(x_i)-\min(x_i)\)
7.2 Inter‑Quartile Range (IQR) – Core
Steps:
- Order the data.
- Find the median – this splits the data into a lower and an upper half.
- Median of the lower half = \(Q_1\).
- Median of the upper half = \(Q_3\).
- IQR = \(Q_3 - Q_1\).
Worked Example (data: 4, 7, 9, 12, 13, 15, 18, 21, 24)
- Median = 13.
- Lower half = 4, 7, 9, 12 → \(Q_1 = \dfrac{7+9}{2}=8\).
- Upper half = 15, 18, 21, 24 → \(Q_3 = \dfrac{18+21}{2}=19.5\).
- IQR = \(19.5-8 = 11.5\).
7.3 Standard Deviation (Extended)
Sample standard deviation:
\[
s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^{2}}
\]
It measures the average distance of the data from the mean. A larger \(s\) indicates greater spread.
8. Correlation (Strength & Direction)
- Positive – both variables increase together.
- Negative – one variable increases while the other decreases.
- Zero – no discernible linear pattern.
The slope of the line of best fit shows direction; the closeness of points to the line (or the absolute value of \(r\)) indicates strength.
9. Interpretation, Comparison & Limitations (C9.2)
When analysing any diagram, ask the following checklist:
- Is the sample size stated? Any possible bias?
- Are axes labelled with units and do they start at zero where required?
- For histograms, are class widths equal? If not, has frequency density been used?
- For pie charts, does the total equal 100 %? Are sectors proportional?
- For bar/dual‑bar charts, are the bars equally spaced?
- For scatter diagrams, is a line of best fit drawn? Is the correlation coefficient (if given) appropriate?
- When comparing two data sets, consider:
- Differences in means, medians and modes.
- Spread – range, IQR, or standard deviation.
- Shape – symmetry, skewness, presence of outliers (visible in box‑plots).
10. Sample Problems
- Frequency table & bar chart – Data: 3, 5, 3, 2, 5, 4, 3, 5, 2, 4.
- Construct a one‑way frequency table.
- Draw a bar chart (label axes, start y‑axis at 0).
- Mean from a histogram – Use the histogram in 4.5.
- Assume each height is the midpoint of its class interval.
- Calculate the mean height.
- Stem‑and‑leaf diagram – Data: 42, 45, 46, 48, 51, 53, 54, 57, 59, 62, 64, 68.
- Draw the diagram.
- State the median and mode from the diagram.
- Scatter diagram & correlation – Points: (1, 2), (2, 4), (3, 6), (4, 8), (5, 10).
- Plot the points.
- Identify the type of correlation.
- Draw a line of best fit (draw‑by‑eye) and calculate its equation using points (1, 2) and (5, 10).
- Estimate the y‑value when \(x=7\).
- Mode and range – Data: 12, 15, 12, 18, 20, 15, 12.
- IQR (Core) – Data: 4, 7, 9, 12, 13, 15, 18, 21, 24.
- Find \(Q_1\), \(Q_3\) and the IQR.
- Standard deviation (Extended) – Data: 5, 7, 9, 11, 13.
- Calculate the mean.
- Find the sample standard deviation.
- Box‑and‑whisker comparison (Extended) – Two sets of test scores are given.
- Construct box‑and‑whisker plots for each set.
- Comment on differences in median, IQR and any outliers.
11. Assessment Criteria (summarised)
- Correct construction of one‑way, two‑way and two‑way frequency tables (including percentages).
- Accurate drawing of bar charts, dual/stacked bar charts, pie charts, pictograms, histograms, stem‑and‑leaf diagrams, and box‑and‑whisker plots with appropriate scales and labels.
- Correct calculation of mean, median, mode, range, IQR (Core) and standard deviation (Extended).
- Clear, justified interpretation of diagrams, explicitly noting any limitations.
- Identification of correlation type; drawing a line of best fit (draw‑by‑eye) and, where required, calculating its equation and the correlation coefficient \(r\).
- Logical comparison of two data sets using both graphical (e.g., side‑by‑side bar charts, box‑plots) and numerical information.
12. Suggested Classroom Activities
- Data‑collection project – Survey classmates on favourite sport, gender and age; produce a two‑way table with percentages, a dual‑bar chart, and a pie chart.
- Live histogram construction – Measure heights of students, group into class intervals, and build a histogram on the board. Discuss why bars touch.
- Stem‑and‑leaf relay – Teams receive a shuffled list of numbers; each team must arrange them into a correct stem‑and‑leaf diagram as quickly as possible.
- Scatter‑plot prediction game – Provide a set of (x, y) points, ask students to draw a line of best fit, calculate its gradient, and make predictions for unseen x‑values.
- Standard deviation challenge (Extended) – Using a calculator, students compute \(s\) for a data set and then discuss what the value tells them about the spread.
- Interpretation debate – Present two bar charts that appear to show a difference; students must argue which differences are genuine and which could be artefacts of scale or sample size.