Measures of central tendency: mean, median, mode

Statistics – Measures of Central Tendency (Cambridge IGCSE Mathematics 0580)

This note follows the 2025‑2027 Cambridge IGCSE Mathematics (0580) syllabus. It covers the core statistics content (C9.1–C9.5) and the optional extended content (C9.6–C9.7) that may appear in the Core and Extended papers.


Contents

  1. C9.1 – Classifying data & basic terminology
  2. C9.2 – Measures of central tendency (mean, median, mode)
  3. C9.2 (2‑3) – Bounds of accuracy, relative & expected frequencies
  4. C9.3 – Grouped data (class intervals)
  5. C9.4 – Charts & diagrams
  6. C9.5 – Scatter diagrams, correlation & line of best fit
  7. C9.6‑C9.7 – Extended content (quartiles, IQR, box plot)
  8. Quick revision checklist
  9. Common pitfalls & how to avoid them

1. Classifying Data & Basic Terminology (C9.1)

  • Qualitative (categorical) data – non‑numerical attributes (e.g. favourite colour, type of transport).
  • Quantitative data – numerical values that can be counted or measured.
    • Discrete – countable whole numbers (e.g. number of books read).
    • Continuous – can take any value within a range (e.g. height, time).
  • Population vs. sample – the whole set of observations vs. a subset used for analysis.
  • Class interval – a range of continuous values grouped together (e.g. 150 – 159 cm). It is introduced here so learners see the link to later sections on grouped data.
  • Frequency table – lists each distinct value (or class interval) and its frequency (f).
    Example: Colours of cars in a car‑park (raw data → frequency table).
Car colourFrequency (f)
Red4
Blue7
Black5
White2
Other2

2. Measures of Central Tendency (C9.2)

2.1 Mean (arithmetic average)

The mean is the sum of all observations divided by the number of observations.

Raw data formula

$$\displaystyle \bar{x}= \frac{\sum_{i=1}^{n}x_i}{n}$$

Frequency‑table formula

$$\displaystyle \bar{x}= \frac{\sum (x\cdot f)}{\sum f}$$

Both give the same result; the second is quicker when a frequency table is already available.

2.2 Median

The median is the middle value when the data are ordered from smallest to largest.

  • If n is odd, the median is the value at position \((n+1)/2\).
  • If n is even, the median is the average of the values at positions \(n/2\) and \(n/2+1\).

For grouped data the median is found using cumulative frequencies (see Section 3).

2.3 Mode

The mode is the value(s) that occur most frequently.

  • No mode – all frequencies are equal.
  • Unimodal – one value with highest frequency.
  • Bimodal – two values share the highest frequency.
  • Multimodal – three or more values share the highest frequency.

2.4 Worked Example – Un‑grouped data

Number of books read by 12 students:

StudentBooks (x)
10
21
31
42
52
62
73
83
93
103
114
124
  1. Mean $$\displaystyle \bar{x}= \frac{0+1+1+2+2+2+3+3+3+3+4+4}{12}= \frac{28}{12}=2.33\text{ books}$$
  2. Median (even n = 12) – average of the 6th and 7th values: $$\displaystyle \text{Median}= \frac{2+3}{2}=2.5\text{ books}$$
  3. Mode – value with highest frequency = 3 books (frequency = 4).

2.5 Worked Example – Grouped data (mean only)

Test scores of 40 pupils are grouped as follows:

Score intervalFrequency (f)
0 – 92
10 – 195
20 – 298
30 – 3912
40 – 499
50 – 594

Steps:

  1. Find the midpoint of each class (used as x):
    • 0 – 9 → 4.5
    • 10 – 19 → 14.5
    • 20 – 29 → 24.5
    • 30 – 39 → 34.5
    • 40 – 49 → 44.5
    • 50 – 59 → 54.5
  2. Calculate \(\sum (x\cdot f)\): $$4.5(2)+14.5(5)+24.5(8)+34.5(12)+44.5(9)+54.5(4)=1\,310$$
  3. Total frequency \(\sum f = 40\).
  4. Mean: $$\displaystyle \bar{x}= \frac{1\,310}{40}=32.75\text{ marks}$$

2.6 Check‑your‑understanding

Q1. A data set has 15 values. Which position gives the median?
Answer: Position \((15+1)/2 = 8\).

Q2. For an even‑sized data set of 20 values, the median is the average of which positions?
Answer: Positions 10 and 11.


3. Bounds of Accuracy, Relative & Expected Frequencies (C9.2 (2‑3))

3.1 Bounds of accuracy

  • For a discrete value x the lower bound is x − 0.5, the upper bound is x + 0.5.
  • For a class interval the lower bound = lower class limit − 0.5, upper bound = upper class limit + 0.5.
  • When calculating a grouped mean, use the class midpoint (average of the lower and upper bounds) as the representative value.

Numeric example (bounds → grouped mean)

Class (cm)Lower limitUpper limitLower boundUpper boundMid‑pointf
140‑149140149139.5149.5144.54
150‑159150159149.5159.5154.58
160‑169160169159.5169.5164.512

Mean calculation (as in Section 2.5) uses the mid‑points 144.5, 154.5, 164.5.

3.2 Relative frequency

Relative frequency = \(\displaystyle \frac{f}{\sum f}\). It may be expressed as a fraction, decimal, or percentage and shows the proportion of the total that each class/value represents.

3.3 Expected frequency

Only required when a theoretical distribution is supplied (e.g., “if the data were uniformly distributed”).

Formula: \[ E = N \times p_{\text{theoretical}} \] where \(N\) = total observations, \(p_{\text{theoretical}}\) = expected proportion for the class.

Use the expected frequency to compare with the observed frequency – a useful check for “reasonable” data.


4. Grouped Data – Class Intervals (C9.3)

4.1 Example data set (heights of 30 students)

Class interval (cm)Frequency (f)
140 – 1494
150 – 1598
160 – 16912
170 – 1795
180 – 1891

4.2 Step‑by‑step calculations

  1. Find the midpoint of each class (used as \(x\)):
    • 140 – 149 → 144.5
    • 150 – 159 → 154.5
    • 160 – 169 → 164.5
    • 170 – 179 → 174.5
    • 180 – 189 → 184.5
  2. Compute \(\sum (x\cdot f)\) and \(\sum f\): \[ \begin{aligned} \sum (x\cdot f) &=144.5(4)+154.5(8)+164.5(12)+174.5(5)+184.5(1)=4\,905\\ \sum f &=30 \end{aligned} \]
  3. Mean \[ \displaystyle \bar{x}= \frac{4\,905}{30}=163.5\text{ cm} \]
  4. Median – locate the \(\frac{n}{2}=15^{\text{th}}\) observation using cumulative frequencies:
    • Cumulative frequencies: 4, 12, 24, 29, 30.
    • The 15th lies in the 160 – 169 class.
    • Grouped‑median formula: \[ \text{Median}=L+\left(\frac{\frac{n}{2}-F}{f}\right)C \] where \(L=160\) (lower bound), \(F=12\) (cumulative freq before the median class), \(f=12\) (freq of median class), \(C=10\) (class width). \[ \text{Median}=160+\left(\frac{15-12}{12}\right)10=162.5\text{ cm} \]
  5. Mode – modal class = 160 – 169 (highest frequency). Grouped‑mode formula: \[ \text{Mode}=L+\frac{(f_m-f_{m-1})}{(2f_m-f_{m-1}-f_{m+1})}\,C \] with \(f_m=12,\;f_{m-1}=8,\;f_{m+1}=5\): \[ \text{Mode}=160+\frac{12-8}{(24-8-5)}\times10 =160+\frac{4}{11}\times10\approx164.4\text{ cm} \]

4.3 Quick checklist for grouped data

  • ✓ Identify class limits and write lower/upper bounds.
  • ✓ Compute class mid‑points.
  • ✓ Multiply each mid‑point by its frequency → \(\sum (x\cdot f)\).
  • ✓ Add all frequencies → \(\sum f\).
  • ✓ Apply the appropriate formula for mean, median or mode.

4.4 Visual link to histograms

When you draw a histogram, the same class limits and bounds are used. Bars touch because the data are continuous.

Histogram (frequency) for the height data (placeholder – draw bars using the class bounds above).

5. Charts & Diagrams (C9.4)

5.1 Bar chart (categorical data)

  • Each category has its own bar; bars do **not** touch.
  • Height (or length) of a bar ∝ frequency or relative frequency.
  • Best for comparing quantities across distinct categories.

Example: Favourite sports of 20 pupils.

SportFrequency
Football8
Basketball5
Swimming3
Running2
Other2

5.2 Pie chart (proportional data)

  • Whole circle = 100 % of the data.
  • Sector angle = \(360^\circ \times\) relative frequency.
  • Use when there are ≤ 6 categories for clarity.

5.3 Pictogram

  • Each picture (symbol) represents a fixed number of units – the “key”.
  • All symbols are the same size; count the symbols in a column to obtain the frequency.

5.4 Stem‑and‑leaf diagram

  • Retains the original data while showing the overall shape.
  • Stem = leading digit(s); leaf = trailing digit(s).
  • Easy to read off median, mode and range.

Example (leaf values 0–9):

Stem | Leaves
  4  | 2 5 7
  5  | 0 1 3 8
  6  | 2 4 4 9
  7  | 1 5

5.5 Histogram (grouped continuous data)

  • Bars **touch** because the data are continuous.
  • Bar width = class width; height = frequency (or relative frequency) per unit width.
  • Use the lower and upper bounds from the class intervals.

5.6 Summary of key differences

Chart typeData typeBars touch?Typical use
Bar chartQualitative / discreteNoCompare categories
HistogramGrouped continuousYesShow distribution of intervals
Pie chartProportionalN/AShow parts of a whole

6. Scatter Diagrams, Correlation & Line of Best Fit (C9.5)

6.1 Scatter diagram

  • Plots ordered pairs \((x, y)\) as points on a Cartesian plane.
  • Reveals the type of relationship:
    • Positive correlation – points rise from left to right.
    • Negative correlation – points fall from left to right.
    • No correlation – points appear random.

6.2 Drawing a straight line of best fit (by eye)

  1. Identify two points that appear to lie near the centre of the cloud (often the extremes).
  2. Draw a straight line through them; adjust so roughly equal numbers of points lie above and below.
  3. Label the line “Best fit” or write its equation \(y = mx + c\) if you wish to estimate gradient and intercept.

6.3 Using the line of best fit

  • Estimate a missing or future value by reading the corresponding \(y\) (or \(x\)) on the line.
  • State the estimate with appropriate units and note that it is an approximation.

6.4 Example – Study time vs. test score

StudentStudy time (h)Score (out of 100)
1258
2365
3471
4578
5684
6788
7892
8995

When plotted, the points form an upward‑sloping cloud. A straight line drawn through the centre gives an approximate equation \(y \approx 5.5x + 48\). Using the line, a student who studies 5.5 h would be expected to score about \(5.5(5.5)+48 \approx 75\) marks.

6.5 Short exercise (try it yourself)

Data: Number of books read (x) vs. reading enjoyment rating (y, out of 10) for 6 pupils.

PupilBooks (x)Rating (y)
114
225
336
447
558
669

Task: Sketch a scatter diagram, draw a line of best fit, and estimate the enjoyment rating for a pupil who reads 7 books.

Solution outline: Points lie on a straight line; the line of best fit is essentially \(y = x + 3\). For 7 books, estimated rating ≈ 10 (capped at 10).


7. Extended Content – Quartiles, Inter‑Quartile Range & Box Plot (C9.6‑C9.7)

  • Quartiles – values that divide an ordered data set into four equal parts.
    • Q1 = lower quartile (25 % of data ≤ Q1)
    • Q2 = median (50 % of data ≤ Q2)
    • Q3 = upper quartile (75 % of data ≤ Q3)
  • Inter‑quartile range (IQR) – \( \text{IQR}=Q_3-Q_1\). Measures the spread of the middle 50 % of the data.
  • Box plot – visual summary showing minimum, Q1, median, Q3, and maximum (or “whiskers”).

These concepts are optional but appear in the extended paper. The same steps used for median (ordered data) apply to Q1 and Q3.


Quick Revision Checklist

  • ✓ Distinguish qualitative vs. quantitative data; discrete vs. continuous.
  • ✓ Construct a frequency table (raw data → table; grouped data → class intervals, bounds, mid‑points).
  • ✓ Calculate the mean:
    • Raw data: \(\displaystyle \bar{x}= \frac{\sum x}{n}\)
    • Grouped data: \(\displaystyle \bar{x}= \frac{\sum (x\cdot f)}{\sum f}\) using class mid‑points.
  • ✓ Find the median:
    • Odd n → position \((n+1)/2\).
    • Even n → average of positions \(n/2\) and \(n/2+1\).
    • Grouped → use cumulative frequencies and the grouped‑median formula.
  • ✓ Identify the mode (single, bimodal, multimodal, or no mode).
  • ✓ Apply bounds of accuracy (±0.5) when required.
  • ✓ Convert frequencies to relative frequencies (or percentages) where appropriate.
  • ✓ Recognise when an expected frequency is needed (theoretical distribution given).
  • ✓ Choose the correct chart type and remember the key differences (bar vs. histogram).
  • ✓ Plot a scatter diagram, describe the correlation, and draw a line of best fit.
  • ✓ (Extended) Compute Q1, Q3, IQR and sketch a box plot.

Common Pitfalls & How to Avoid Them

  • Mixing up class limits and bounds – always subtract 0.5 from the lower limit and add 0.5 to the upper limit before finding mid‑points.
  • Using the wrong formula for median with even n – remember to average the two middle values, not just pick one.
  • Forgetting to order data before finding median or quartiles – a quick sort step prevents errors.
  • Applying the grouped‑mean formula to un‑grouped data – use the simple \(\sum x / n\) formula instead.
  • Drawing a histogram with gaps between bars – bars must touch because the data are continuous.
  • Interpreting a line of best fit as an exact prediction – always state that the value is an estimate and give appropriate units.
  • Ignoring the “key” in a pictogram – the key tells you how many units each symbol represents; without it the diagram is meaningless.

Create an account or Login to take a Quiz

46 views
0 improvement suggestions

Log in to suggest improvements to this note.