Using Correlation Analysis in Cambridge International AS & A Level Biology (9700)
1. Why Correlation Matters for the Syllabus
- Correlation quantifies the strength and direction of a relationship between two quantitative variables.
- It is explicitly required for:
- AO1 – recall of biodiversity, physiology, genetics and other core concepts.
- AO2 – interpretation of data from investigations and past papers.
- AO3 – planning, analysing and evaluating experiments (Paper 5).
- In biology the variables are usually a biotic factor (e.g., species abundance, enzyme activity) and an abiotic factor (e.g., temperature, pH, light intensity).
- Correlation helps you:
- Identify which factor most strongly influences distribution, abundance or physiological rate.
- Distinguish linear relationships (Pearson) from monotonic but non‑linear trends (Spearman).
- Provide a statistical foundation for further modelling (regression, ANOVA, GLM).
2. Syllabus Mapping – Where Correlation Can Be Applied (Units 12‑19)
| Unit / Topic | Typical Variables for Correlation | Suggested Coefficient |
|---|
| 12 – Energy & Respiration | Respiration rate vs. temperature; CO₂ concentration vs. O₂ production | Pearson (linear) – after checking normality |
| 13 – Photosynthesis | Photosynthetic rate vs. light intensity, CO₂ level, or leaf temperature | Pearson for the linear portion; Spearman for the whole data set (including saturation) |
| 14 – Homeostasis | Blood glucose vs. insulin concentration; body temperature vs. evaporative water loss | Pearson (after transformation if needed) |
| 15 – Control & Coordination | Nerve‑impulse frequency vs. stimulus intensity; hormone concentration vs. target‑organ response | Pearson (linear) or Spearman (if data are ordinal) |
| 16 – Inheritance | Genotype frequency vs. phenotypic trait score; allele frequency vs. environmental stressor | Spearman (monotonic, often non‑linear) |
| 17 – Selection & Evolution | Survival rate vs. trait size; fitness index vs. environmental gradient | Pearson (if linear) or Spearman (if monotonic) |
| 18 – Classification, Biodiversity & Conservation | Species richness vs. habitat area; diversity index vs. altitude; IUCN status vs. human population density | Pearson after log‑transformation (species‑area); Spearman for rank‑ordered data |
| 19 – Genetic Technology | Allele frequency vs. pesticide concentration; expression level vs. gene‑copy number | Spearman (monotonic, often non‑normal) |
3. Choosing the Right Correlation Coefficient
- Pearson’s linear correlation (r)
- Data type: interval or ratio.
- Assumptions:
- Linear relationship.
- Both variables approximately normally distributed.
- Homoscedasticity (equal variances across the range).
- Independent observations.
- Best for: temperature vs. abundance, light intensity vs. photosynthetic rate, log‑transformed species‑area data.
- Spearman’s rank correlation (ρ or rs)
- Data type: ordinal, non‑normal, or continuous data that are monotonic but not linear.
- Robust to outliers and does not require transformation.
- Best for: rank‑ordered habitat quality vs. species richness, allele frequency vs. pesticide concentration, any monotonic trend.
4. Pearson’s Linear Correlation
4.1 Formula
r = \frac{\displaystyle\sum{i=1}^{n}(xi-\bar{x})(y_i-\bar{y})}
{\sqrt{\displaystyle\sum{i=1}^{n}(xi-\bar{x})^{2}}\;
\sqrt{\displaystyle\sum{i=1}^{n}(yi-\bar{y})^{2}}}
where xi, yi are individual observations, ¯x, ¯y are means, and n is the number of paired observations.
4.2 Step‑by‑step calculation (exam‑style)
- Calculate the means ¯x and ¯y.
- Compute the deviations (xi‑¯x) and (yi‑¯y).
- Find the products of deviations, sum them → Σ(x‑¯x)(y‑¯y).
- Calculate Σ(x‑¯x)² and Σ(y‑¯y)², take square‑roots.
- Insert into the formula to obtain r.
4.3 Interpreting r
| r value | Strength & direction |
|---|
| ±1.0 | Perfect linear relationship |
| ±0.7 – ±0.9 | Strong |
| ±0.4 – ±0.6 | Moderate |
| ±0.1 – ±0.3 | Weak |
| 0 | No linear relationship |
4.4 Statistical Significance
- Test statistic: t = r\sqrt{\frac{n-2}{1-r^{2}}} with df = n‑2.
- Compare |t| with the critical value from the t‑distribution (α = 0.05 is standard for exams).
- If |t| > tcrit, the correlation is statistically significant (p < 0.05).
- Report the coefficient of determination r² as the proportion of variance explained.
- For small samples (n ≤ 5) be cautious – significance tests have low power.
4.5 Confidence Interval (optional for extended projects)
Approximate 95 % CI for r using Fisher’s Z‑transformation:
Z = \frac{1}{2}\ln\!\left(\frac{1+r}{1-r}\right) \\
SE_Z = \frac{1}{\sqrt{n-3}} \\
Z{lower,upper} = Z \pm 1.96\,SEZ \\
r{lower,upper} = \frac{e^{2Z{lower,upper}}-1}{e^{2Z_{lower,upper}}+1}
5. Spearman’s Rank Correlation
5.1 Procedure
- Rank each variable separately (smallest = 1, largest = n). Tied values receive the average rank.
- For each pair calculate the difference in ranks: di = R(xi) – R(yi).
- Compute Σdi².
- Apply the formula:
rs = 1 - \frac{6\displaystyle\sum{i=1}^{n}d_i^{2}}{n\,(n^{2}-1)}
5.2 Significance Test
- For n ≥ 10, approximate t‑test: t = rs\sqrt{\frac{n-2}{1-rs^{2}}} with df = n‑2 (same as Pearson).
- For n < 10 use critical values from Spearman tables (e.g., at α = 0.05, n = 5 → |r_s| ≥ 0.9).
- State whether the monotonic trend is statistically significant.
6. Practical Examples Aligned with the Syllabus
6.1 Species‑Area Relationship (Unit 18)
| Island | Area (km²) | Species (S) |
|---|
| 1 | 0.5 | 12 |
| 2 | 1.0 | 18 |
| 3 | 2.0 | 25 |
| 4 | 4.0 | 34 |
| 5 | 8.0 | 45 |
- Log‑transform both columns (log A, log S).
- Calculate Pearson’s r for the transformed data (r ≈ 0.98).
- t‑test: t ≈ 13.4, df = 3 → p < 0.001 (highly significant).
- Interpretation: a very strong linear relationship confirms the power‑law S = cA^z.
6.2 Light Intensity vs. Photosynthetic Rate (Units 12 & 13)
| Light (µmol m⁻² s⁻¹) | O₂ evolution (µmol g⁻¹ h⁻¹) |
|---|
| 0 | 0 |
| 50 | 12 |
| 100 | 22 |
| 150 | 30 |
| 200 | 35 |
| 250 | 38 |
| 300 | 40 |
- Plot the data. The segment 0–200 µmol m⁻² s⁻¹ is approximately linear.
- Pearson’s r for the linear portion ≈ 0.99 (t ≈ 21.5, df = 4, p < 0.001).
- Beyond 200 µmol m⁻² s⁻¹ the curve plateaus – a non‑linear region where Spearman’s ρ (using all points) ≈ 0.94, still significant, describes the overall monotonic trend.
- Discussion point: why does the relationship saturate? (light‑saturation of photosystem II).
6.3 Allele Frequency vs. Pesticide Concentration (Unit 19)
| Site | Pesticide (mg L⁻¹) | Allele A frequency |
|---|
| 1 | 0.1 | 0.12 |
| 2 | 0.3 | 0.18 |
| 3 | 0.5 | 0.25 |
| 4 | 0.7 | 0.33 |
| 5 | 0.9 | 0.42 |
- Data are monotonic but not perfectly linear; use Spearman’s ρ.
- Ranks are identical for both variables → Σd² = 0 → ρ = 1.0.
- With n = 5, critical ρ at α = 0.05 is 0.9 → the correlation is significant.
- Interpretation: higher pesticide levels strongly select for allele A.
6.4 Additional A‑Level Applications (Units 14‑17)
- Homeostasis (Unit 14) – Pearson’s r between blood glucose (mmol L⁻¹) and insulin concentration (µU mL⁻¹) after a glucose tolerance test.
- Control & Coordination (Unit 15) – Spearman’s ρ for stimulus intensity rank vs. nerve‑impulse frequency rank.
- Inheritance (Unit 16) – Correlate genotype frequency (continuous) with phenotypic score (ordinal) using Spearman.
- Selection & Evolution (Unit 17) – Pearson’s r between beak depth (mm) and seed hardness (N) in a finch population.
7. Planning an Investigation (AO3 Checklist)
- Define a clear hypothesis – e.g., “Increasing temperature will increase the rate of cellular respiration in yeast.”
- Select variables
- Independent (temperature, °C).
- Dependent (CO₂ production, mL min⁻¹).
- Design the sampling strategy
- Choose at least five temperature levels spanning the expected linear range.
- Replicate each level three times (random order) to minimise systematic error.
- Record ancillary abiotic data (pH, substrate concentration) for possible covariates.
- Data collection – use a gas syringe or CO₂ probe; keep all other conditions constant.
- Statistical analysis
- Check normality of both variables (skewness, Q‑Q plot).
- If normal → calculate Pearson’s r, t‑test, r² and 95 % CI.
- If non‑normal or ordinal → rank data and calculate Spearman’s ρ.
- Report p‑value and state whether the correlation is significant at the 5 % level.
- Evaluation
- Identify sources of random error (instrument precision, biological variability).
- Identify systematic error (temperature gradients, incomplete mixing).
- Discuss the limitation of correlation (does not prove causation) and suggest a follow‑up controlled experiment.
8. Limitations & Precautions (Exam‑style Points)
- Independence – Spatial or temporal autocorrelation inflates r; increase spacing between sampling sites or use time‑averaged data.
- Normality & homoscedasticity – Verify with histograms or Levene’s test; transform (log, square‑root) if needed.
- Outliers – Inspect scatter plots; a single outlier can dramatically change Pearson’s r but has little effect on Spearman’s ρ.
- Sample size – Small n (< 5) yields unreliable coefficients and low statistical power; aim for n ≥ 10 where possible.
- Multiple testing – If several correlations are examined, note the increased chance of Type I error (consider Bonferroni correction in extended projects).
- Causation vs. correlation – Always discuss alternative explanations (confounding variables, reverse causality).
9. Suggested Revision Diagrams
- Figure 1 – Scatter plot of log Area vs. log Species richness with regression line and Pearson’s r (including 95 % CI).
- Figure 2 – Light intensity vs. photosynthetic rate showing the linear region (Pearson) and the saturation plateau (where Spearman’s ρ is indicated).
- Figure 3 – Box‑plot of residuals from a Pearson regression to illustrate homoscedasticity.
10. Summary Checklist for Students (AO1‑AO3)
- Identify the two variables (biotic ↔ abiotic or two abiotic factors).
- Predict the shape of the relationship (linear vs. monotonic non‑linear).
- Check data suitability:
- Normal distribution → Pearson.
- Ordinal or non‑normal → Spearman.
- Perform the appropriate calculation:
- Pearson – means, deviations, Σ, r, t‑test, r², CI.
- Spearman – rank, d², rs, significance test.
- Interpret the sign and magnitude in ecological or physiological terms (e.g., “higher temperature → higher respiration rate”).
- Link the statistical outcome back to the relevant syllabus topic (species‑area, enzyme activity, genotype frequency, etc.).
- Evaluate:
- Possible sources of error and how they affect the correlation.
- Whether the correlation can be taken as evidence of causation.
- Improvements for future work (larger n, control of confounders, alternative statistical tests).