Representation of data: diagrams, measures of central tendency and dispersion

Representation of Data – Diagrams, Central Tendency & Dispersion (Cambridge 9709)

1. Organising Raw Data

Before any calculations or graphs are drawn, the observations should be arranged into a frequency distribution. This makes the data easier to interpret and provides the basis for most diagrams.

Class / ValueFrequency (f)
0 – 43
5 – 97
10 – 145
15 – 192

2. Choosing the Most Appropriate Diagram

The syllabus requires you to choose and justify one of four mandatory diagrams. The decision table below links the nature of the data to the diagram that best meets the requirement.

Data Situation Diagram to Use Why it is appropriate
Small‑to‑moderate raw quantitative set; you need to retain every value. Stem‑and‑Leaf Plot Shows each observation, makes reading the median, quartiles and mode straightforward.
Continuous or grouped data with equal class widths; you need to see the overall shape (symmetry, skewness, modality). Histogram Bars represent frequencies; the visual shape of the distribution is clear.
Any quantitative data when a concise summary of centre, spread, outliers and possible asymmetry is required. Box‑and‑Whisker Plot (Box Plot) Displays the five‑number summary and highlights outliers in a single picture.
Data (discrete or continuous) where specific percentiles, medians or comparisons between several sets are needed. Cumulative‑Frequency Polygon (Ogive) Allows easy extraction of any percentile; useful for comparing cumulative patterns.

3. Mandatory Diagrams – Construction & Interpretation

3.1 Stem‑and‑Leaf Plot

  • When to use: raw data, same number of digits, moderate size (≤ 30‑40 values).
  • Construction steps
    1. Order the data from smallest to largest.
    2. Choose a stem (all but the last digit) and a leaf (the final digit).
    3. Write each stem in ascending order; list the leaves in ascending order beside the appropriate stem.
  • Interpretation tips
    • Median and quartiles are read directly from the ordered leaves.
    • Mode is the leaf(s) that appear most often.
    • Outliers appear as isolated leaves far from the main cluster.

3.2 Histogram

  • When to use: grouped continuous data with equal class widths.
  • Construction steps
    1. Decide on class width (all classes must be equal).
    2. Draw the class boundaries on the horizontal axis.
    3. For each class, draw a rectangle whose height equals the class frequency (or relative frequency).
    4. Ensure that adjoining rectangles touch – the area of the bars represents the total number of observations.
  • Interpretation tips
    • Shape of the distribution: symmetric, positively skewed, negatively skewed, unimodal, bimodal, etc.
    • Identify gaps (possible outliers) and clusters.
    • Compare several histograms by keeping the same class width and scale.

3.3 Box‑and‑Whisker Plot

  • When to use: any quantitative data where a concise visual summary of centre, spread and outliers is needed.
  • Key components
    • Minimum, \(Q_{1}\), Median, \(Q_{3}\), Maximum.
    • Outliers: any observation below \(Q_{1}-1.5\text{IQR}\) or above \(Q_{3}+1.5\text{IQR}\).
  • Construction steps
    1. Find the five‑number summary.
    2. Draw a number line that comfortably includes the minimum and maximum.
    3. Mark the five summary values; draw a box from \(Q_{1}\) to \(Q_{3}\) and a line at the median.
    4. Draw whiskers from the box to the smallest and largest non‑outlier values.
    5. Plot any outliers as individual points.
  • Interpretation tips
    • Length of the box = IQR (spread of the middle 50 %).
    • Position of the median within the box indicates skewness.
    • Long whiskers or many outliers suggest a skewed distribution.

3.4 Cumulative‑Frequency Polygon (Ogive)

  • When to use: to read any percentile, to compare several data sets, or when the data are cumulative by nature.
  • Construction steps
    1. Prepare a frequency table (class intervals and frequencies).
    2. Calculate cumulative frequencies (add each frequency to the total of all previous ones).
    3. Plot points at the upper class boundary against the cumulative frequency.
    4. Join the points with straight lines; start the polygon at the origin (0, 0).
  • Interpretation tips
    • The median is where the ogive crosses the \(\frac{n}{2}\) line.
    • The 25 % and 75 % percentiles are read at \(\frac{n}{4}\) and \(\frac{3n}{4}\) respectively.
    • Steeper sections indicate a high concentration of data in that interval.

4. Working with Grouped Data

Often the syllabus asks you to calculate the mean and standard deviation when the data are presented in class form. Use the class mid‑point (\(x\)) as a representative value.

  • Mean (sample) \[ \bar{x}= \frac{\displaystyle\sum f\,x}{\displaystyle\sum f} \]
  • Variance (sample) \[ s^{2}= \frac{\displaystyle\sum f\,x^{2} - \frac{(\displaystyle\sum f\,x)^{2}}{\displaystyle\sum f}}{n-1} \] where \(n=\sum f\).
    Tip: calculate \(\sum f\,x\) and \(\sum f\,x^{2}\) in a separate column of the frequency table.
  • Standard deviation \(s=\sqrt{s^{2}}\).

Example – Grouped Data

ClassMid‑point \(x\)Frequency \(f\)\(f\,x\)\(f\,x^{2}\)
0‑423612
5‑97749343
10‑1412560720
15‑1917234578
Totals1491653

Mean: \(\displaystyle\bar{x}= \frac{149}{17}=8.76\).
Variance: \(\displaystyle s^{2}= \frac{1653-\frac{149^{2}}{17}}{16}=6.24\).
Standard deviation: \(s\approx2.50\).

5. Measures of Central Tendency

  • Mean \(\bar{x}\) (sample) – arithmetic average. Use \(\displaystyle\bar{x}= \frac{\sum x_i}{n}\) for raw data or the grouped formula above.
  • Median – middle value after ordering.
    • Odd \(n\): median = \(x_{(k)}\) where \(k=\frac{n+1}{2}\).
    • Even \(n\): median = \(\dfrac{x_{(k)}+x_{(k+1)}}{2}\) where \(k=\frac{n}{2}\).
  • Mode – value(s) occurring most frequently. May be unimodal, bimodal or multimodal.

6. Measures of Dispersion

  • Range \(=x_{\max}-x_{\min}\).
  • Inter‑Quartile Range (IQR) \(=Q_{3}-Q_{1}\) – spread of the middle 50 %.
  • Variance
    • Sample variance \(s^{2}= \dfrac{\sum (x_i-\bar{x})^{2}}{n-1}\).
    • Population variance \(\sigma^{2}= \dfrac{\sum (x_i-\mu)^{2}}{N}\) (used rarely in the syllabus but good to recognise).
  • Standard Deviation
    • Sample \(s=\sqrt{s^{2}}\).
    • Population \(\sigma=\sqrt{\sigma^{2}}\).

7. Worked Example – All Concepts Together (Raw Data)

Data (ordered): \[5,\;7,\;8,\;8,\;10,\;12,\;13,\;13,\;13,\;15\] \(n=10\)

QuantityComputationResult
Mean \(\bar{x}\) \(\displaystyle\bar{x}= \frac{5+7+8+8+10+12+13+13+13+15}{10}\) 10.2
Median Even \(n\Rightarrow\) \(\dfrac{x_{5}+x_{6}}{2}= \dfrac{10+12}{2}\) 11
Mode Most frequent value 13 (three occurrences)
Range \(15-5\) 10
Q₁ Median of lower half \(\{5,7,8,8,10\}\) 8
Q₃ Median of upper half \(\{12,13,13,13,15\}\) 13
IQR \(Q_{3}-Q_{1}\) 5
Variance \(s^{2}\) \(\displaystyle s^{2}= \frac{\sum (x_i-10.2)^{2}}{9}= \frac{87.84}{9}\) 9.76
Standard Deviation \(s\) \(\sqrt{9.76}\) ≈ 3.12

7.1 Box‑and‑Whisker Plot (raw data)

Five‑number summary: Min = 5, \(Q_{1}=8\), Median = 11, \(Q_{3}=13\), Max = 15. No outliers because all points lie within \(1.5\times\text{IQR}=7.5\) of the quartiles.

7.2 Stem‑and‑Leaf Plot (raw data)

Stem | Leaf
-----|----------------
 5   | 5
 6   | (none)
 7   | 7
 8   | 8 8
 9   | (none)
10   | 0
11   | (none)
12   | 2
13   | 3 3 3
14   | (none)
15   | 5

Median (11) and quartiles (Q₁ = 8, Q₃ = 13) can be read directly from the ordered leaves.

8. Choosing the Most Appropriate Summary

  • Symmetric distribution, no outliers: Report mean ± standard deviation (or variance) for a concise description.
  • Skewed distribution or outliers present: Use median and IQR; a box plot visualises the asymmetry and highlights outliers.
  • Small raw data set: Stem‑and‑leaf or a simple list is often more informative than a histogram.
  • Need for percentiles or comparison of several data sets: An ogive is the most efficient tool.
  • Data already grouped: Compute mean and standard deviation from class mid‑points (see Section 4).

9. Summary Checklist (What You Must Be Able to Do)

  1. Construct a frequency table from raw observations.
  2. Select the correct mandatory diagram, draw it accurately, and justify the choice.
  3. Calculate mean, median and mode for both raw and grouped data (including identification of unimodal, bimodal or multimodal sets).
  4. Calculate range, IQR, variance and standard deviation, using the \(n-1\) denominator for sample statistics.
  5. Interpret the numerical summaries and diagrams in the context of the problem (shape, skewness, modality, outliers).
  6. State clearly whether you are dealing with a sample (\(\bar{x}, s, s^{2}\)) or a population (\(\mu, \sigma, \sigma^{2}\)).
  7. Comment on the most appropriate concise description (e.g., “mean ± SD” vs. “median ± IQR”).

10. Common Pitfalls & Quick Reminders

  • Never use the same class width for a histogram if the syllabus specifies “equal class widths”.
  • Remember to use the upper class boundary (not the lower) when plotting an ogive.
  • When calculating variance from grouped data, keep the denominator as \(n-1\) (sample) – the syllabus does not require population variance.
  • Outliers are defined using the IQR rule; points exactly on the boundary are *not* outliers.
  • Always label axes, include units, and state the scale on any diagram you produce in an exam.

Create an account or Login to take a Quiz

37 views
0 improvement suggestions

Log in to suggest improvements to this note.