use Spearman’s rank correlation and Pearson’s linear correlation to analyse the relationships between two variables, including how biotic and abiotic factors affect the distribution and abundance of species (the formulae for these correlations will b

Using Correlation Analysis in Cambridge International AS & A Level Biology (9700)

1. Why Correlation Matters for the Syllabus

  • Correlation quantifies the strength and direction of a relationship between two quantitative variables.
  • It is explicitly required for:

    • AO1 – recall of biodiversity, physiology, genetics and other core concepts.
    • AO2 – interpretation of data from investigations and past papers.
    • AO3 – planning, analysing and evaluating experiments (Paper 5).

  • In biology the variables are usually a biotic factor (e.g., species abundance, enzyme activity) and an abiotic factor (e.g., temperature, pH, light intensity).
  • Correlation helps you:

    • Identify which factor most strongly influences distribution, abundance or physiological rate.
    • Distinguish linear relationships (Pearson) from monotonic but non‑linear trends (Spearman).
    • Provide a statistical foundation for further modelling (regression, ANOVA, GLM).

2. Syllabus Mapping – Where Correlation Can Be Applied (Units 12‑19)

Unit / TopicTypical Variables for CorrelationSuggested Coefficient
12 – Energy & RespirationRespiration rate vs. temperature; CO₂ concentration vs. O₂ productionPearson (linear) – after checking normality
13 – PhotosynthesisPhotosynthetic rate vs. light intensity, CO₂ level, or leaf temperaturePearson for the linear portion; Spearman for the whole data set (including saturation)
14 – HomeostasisBlood glucose vs. insulin concentration; body temperature vs. evaporative water lossPearson (after transformation if needed)
15 – Control & CoordinationNerve‑impulse frequency vs. stimulus intensity; hormone concentration vs. target‑organ responsePearson (linear) or Spearman (if data are ordinal)
16 – InheritanceGenotype frequency vs. phenotypic trait score; allele frequency vs. environmental stressorSpearman (monotonic, often non‑linear)
17 – Selection & EvolutionSurvival rate vs. trait size; fitness index vs. environmental gradientPearson (if linear) or Spearman (if monotonic)
18 – Classification, Biodiversity & ConservationSpecies richness vs. habitat area; diversity index vs. altitude; IUCN status vs. human population densityPearson after log‑transformation (species‑area); Spearman for rank‑ordered data
19 – Genetic TechnologyAllele frequency vs. pesticide concentration; expression level vs. gene‑copy numberSpearman (monotonic, often non‑normal)

3. Choosing the Right Correlation Coefficient

  • Pearson’s linear correlation (r)

    • Data type: interval or ratio.
    • Assumptions:

      • Linear relationship.
      • Both variables approximately normally distributed.
      • Homoscedasticity (equal variances across the range).
      • Independent observations.

    • Best for: temperature vs. abundance, light intensity vs. photosynthetic rate, log‑transformed species‑area data.

  • Spearman’s rank correlation (ρ or rs)

    • Data type: ordinal, non‑normal, or continuous data that are monotonic but not linear.
    • Robust to outliers and does not require transformation.
    • Best for: rank‑ordered habitat quality vs. species richness, allele frequency vs. pesticide concentration, any monotonic trend.

4. Pearson’s Linear Correlation

4.1 Formula

r = \frac{\displaystyle\sum{i=1}^{n}(xi-\bar{x})(y_i-\bar{y})}

{\sqrt{\displaystyle\sum{i=1}^{n}(xi-\bar{x})^{2}}\;

\sqrt{\displaystyle\sum{i=1}^{n}(yi-\bar{y})^{2}}}

where xi, yi are individual observations, ¯x, ¯y are means, and n is the number of paired observations.

4.2 Step‑by‑step calculation (exam‑style)

  1. Calculate the means ¯x and ¯y.
  2. Compute the deviations (xi‑¯x) and (yi‑¯y).
  3. Find the products of deviations, sum them → Σ(x‑¯x)(y‑¯y).
  4. Calculate Σ(x‑¯x)² and Σ(y‑¯y)², take square‑roots.
  5. Insert into the formula to obtain r.

4.3 Interpreting r

r valueStrength & direction
±1.0Perfect linear relationship
±0.7 – ±0.9Strong
±0.4 – ±0.6Moderate
±0.1 – ±0.3Weak
0No linear relationship

4.4 Statistical Significance

  • Test statistic: t = r\sqrt{\frac{n-2}{1-r^{2}}} with df = n‑2.
  • Compare |t| with the critical value from the t‑distribution (α = 0.05 is standard for exams).
  • If |t| > tcrit, the correlation is statistically significant (p < 0.05).
  • Report the coefficient of determination as the proportion of variance explained.
  • For small samples (n ≤ 5) be cautious – significance tests have low power.

4.5 Confidence Interval (optional for extended projects)

Approximate 95 % CI for r using Fisher’s Z‑transformation:

Z = \frac{1}{2}\ln\!\left(\frac{1+r}{1-r}\right) \\

SE_Z = \frac{1}{\sqrt{n-3}} \\

Z{lower,upper} = Z \pm 1.96\,SEZ \\

r{lower,upper} = \frac{e^{2Z{lower,upper}}-1}{e^{2Z_{lower,upper}}+1}

5. Spearman’s Rank Correlation

5.1 Procedure

  1. Rank each variable separately (smallest = 1, largest = n). Tied values receive the average rank.
  2. For each pair calculate the difference in ranks: di = R(xi) – R(yi).
  3. Compute Σdi².
  4. Apply the formula:

rs = 1 - \frac{6\displaystyle\sum{i=1}^{n}d_i^{2}}{n\,(n^{2}-1)}

5.2 Significance Test

  • For n ≥ 10, approximate t‑test: t = rs\sqrt{\frac{n-2}{1-rs^{2}}} with df = n‑2 (same as Pearson).
  • For n < 10 use critical values from Spearman tables (e.g., at α = 0.05, n = 5 → |r_s| ≥ 0.9).
  • State whether the monotonic trend is statistically significant.

6. Practical Examples Aligned with the Syllabus

6.1 Species‑Area Relationship (Unit 18)

IslandArea (km²)Species (S)
10.512
21.018
32.025
44.034
58.045

  1. Log‑transform both columns (log A, log S).
  2. Calculate Pearson’s r for the transformed data (r ≈ 0.98).
  3. t‑test: t ≈ 13.4, df = 3 → p < 0.001 (highly significant).
  4. Interpretation: a very strong linear relationship confirms the power‑law S = cA^z.

6.2 Light Intensity vs. Photosynthetic Rate (Units 12 & 13)

Light (µmol m⁻² s⁻¹)O₂ evolution (µmol g⁻¹ h⁻¹)
00
5012
10022
15030
20035
25038
30040

  • Plot the data. The segment 0–200 µmol m⁻² s⁻¹ is approximately linear.
  • Pearson’s r for the linear portion ≈ 0.99 (t ≈ 21.5, df = 4, p < 0.001).
  • Beyond 200 µmol m⁻² s⁻¹ the curve plateaus – a non‑linear region where Spearman’s ρ (using all points) ≈ 0.94, still significant, describes the overall monotonic trend.
  • Discussion point: why does the relationship saturate? (light‑saturation of photosystem II).

6.3 Allele Frequency vs. Pesticide Concentration (Unit 19)

SitePesticide (mg L⁻¹)Allele A frequency
10.10.12
20.30.18
30.50.25
40.70.33
50.90.42

  • Data are monotonic but not perfectly linear; use Spearman’s ρ.
  • Ranks are identical for both variables → Σd² = 0 → ρ = 1.0.
  • With n = 5, critical ρ at α = 0.05 is 0.9 → the correlation is significant.
  • Interpretation: higher pesticide levels strongly select for allele A.

6.4 Additional A‑Level Applications (Units 14‑17)

  • Homeostasis (Unit 14) – Pearson’s r between blood glucose (mmol L⁻¹) and insulin concentration (µU mL⁻¹) after a glucose tolerance test.
  • Control & Coordination (Unit 15) – Spearman’s ρ for stimulus intensity rank vs. nerve‑impulse frequency rank.
  • Inheritance (Unit 16) – Correlate genotype frequency (continuous) with phenotypic score (ordinal) using Spearman.
  • Selection & Evolution (Unit 17) – Pearson’s r between beak depth (mm) and seed hardness (N) in a finch population.

7. Planning an Investigation (AO3 Checklist)

  1. Define a clear hypothesis – e.g., “Increasing temperature will increase the rate of cellular respiration in yeast.”
  2. Select variables

    • Independent (temperature, °C).
    • Dependent (CO₂ production, mL min⁻¹).

  3. Design the sampling strategy

    • Choose at least five temperature levels spanning the expected linear range.
    • Replicate each level three times (random order) to minimise systematic error.
    • Record ancillary abiotic data (pH, substrate concentration) for possible covariates.

  4. Data collection – use a gas syringe or CO₂ probe; keep all other conditions constant.
  5. Statistical analysis

    • Check normality of both variables (skewness, Q‑Q plot).
    • If normal → calculate Pearson’s r, t‑test, r² and 95 % CI.
    • If non‑normal or ordinal → rank data and calculate Spearman’s ρ.
    • Report p‑value and state whether the correlation is significant at the 5 % level.

  6. Evaluation

    • Identify sources of random error (instrument precision, biological variability).
    • Identify systematic error (temperature gradients, incomplete mixing).
    • Discuss the limitation of correlation (does not prove causation) and suggest a follow‑up controlled experiment.

8. Limitations & Precautions (Exam‑style Points)

  • Independence – Spatial or temporal autocorrelation inflates r; increase spacing between sampling sites or use time‑averaged data.
  • Normality & homoscedasticity – Verify with histograms or Levene’s test; transform (log, square‑root) if needed.
  • Outliers – Inspect scatter plots; a single outlier can dramatically change Pearson’s r but has little effect on Spearman’s ρ.
  • Sample size – Small n (< 5) yields unreliable coefficients and low statistical power; aim for n ≥ 10 where possible.
  • Multiple testing – If several correlations are examined, note the increased chance of Type I error (consider Bonferroni correction in extended projects).
  • Causation vs. correlation – Always discuss alternative explanations (confounding variables, reverse causality).

9. Suggested Revision Diagrams

  • Figure 1 – Scatter plot of log Area vs. log Species richness with regression line and Pearson’s r (including 95 % CI).
  • Figure 2 – Light intensity vs. photosynthetic rate showing the linear region (Pearson) and the saturation plateau (where Spearman’s ρ is indicated).
  • Figure 3 – Box‑plot of residuals from a Pearson regression to illustrate homoscedasticity.

10. Summary Checklist for Students (AO1‑AO3)

  1. Identify the two variables (biotic ↔ abiotic or two abiotic factors).
  2. Predict the shape of the relationship (linear vs. monotonic non‑linear).
  3. Check data suitability:

    • Normal distribution → Pearson.
    • Ordinal or non‑normal → Spearman.

  4. Perform the appropriate calculation:

    • Pearson – means, deviations, Σ, r, t‑test, r², CI.
    • Spearman – rank, d², rs, significance test.

  5. Interpret the sign and magnitude in ecological or physiological terms (e.g., “higher temperature → higher respiration rate”).
  6. Link the statistical outcome back to the relevant syllabus topic (species‑area, enzyme activity, genotype frequency, etc.).
  7. Evaluate:

    • Possible sources of error and how they affect the correlation.
    • Whether the correlation can be taken as evidence of causation.
    • Improvements for future work (larger n, control of confounders, alternative statistical tests).