use Spearman’s rank correlation and Pearson’s linear correlation to analyse the relationships between two variables, including how biotic and abiotic factors affect the distribution and abundance of species (the formulae for these correlations will b

Biodiversity & Correlation – Using Spearman’s ρ and Pearson’s r in Cambridge IGCSE/A‑Level Biology (9700)

1. Syllabus mapping – what the Cambridge syllabus expects and how the notes meet it

Syllabus requirementHow the notes address itSuggested additions (now included)
Topic 19 – Classification, biodiversity and conservation (AO2)Correlation is linked to species‑richness, population density and habitat‑quality data.Added a subsection on biodiversity indices (species‑richness, Shannon–Wiener) and how they can be correlated with abiotic/biotic gradients; brief link to IUCN categories and conservation threats.
Paper 5 – Planning, analysis and evaluation (AO2 & AO3)Step‑by‑step statistical procedure, data presentation, and evaluation checklist.Included explicit guidance on experimental design, replication, randomisation, handling outliers, data transformation and a reminder of other statistical tools (χ², t‑test for means, ANOVA) that may appear alongside correlation.
Statistical techniques – correlation, significance testing, assumptions, outliers (Section 6)Formulas for ρ and r, assumptions, t‑test for Pearson, critical‑value table for Spearman.Expanded on degrees of freedom, p‑value interpretation, sample‑size justification, effect of outliers, data‑transformation strategies, and Cambridge grading bands for correlation strength.

2. Core biological concepts

  • Distribution – geographic area where a species occurs.
  • Abundance – number of individuals per unit area or volume.
  • Biotic factors – interactions with other organisms (competition, predation, symbiosis, mutualism).
  • Abiotic factors – non‑living environmental variables (temperature, pH, moisture, light intensity, salinity).
  • Biodiversity indices – quantitative measures such as species‑richness (S) and the Shannon–Wiener index (H′) that can be correlated with environmental gradients.
  • Conservation status – IUCN categories (LC, NT, VU, EN, CR, EW, EX) and major threats; these can be examined statistically (e.g., threat level vs. habitat fragmentation).

3. Choosing the appropriate correlation test

TestData type requiredRelationship shapeTypical biological use
Spearman’s rank correlation (ρ)Ordinal or ranked data; can also be used for continuous data that are not normally distributed.Monotonic (consistently increasing or decreasing) but not necessarily linear.Ranked habitat quality vs. species‑richness; predator‑prey density ranks; presence/absence scores.
Pearson’s linear correlation (r)Interval/ratio (continuous) data that are approximately normally distributed.Linear relationship.Temperature vs. metabolic rate; nutrient concentration vs. plant biomass; pH vs. enzyme activity.

4. Mathematical formulas (with brief derivation notes)

Spearman’s rank correlation coefficient (ρ)

\[

\rho = 1 - \frac{6\displaystyle\sum{i=1}^{n} di^{2}}{n\,(n^{2}-1)}

\]

  • \(d_i\) = difference between the rank of the \(x\)‑value and the rank of the \(y\)‑value for observation \(i\).
  • \(n\) = number of paired observations.
  • When there are tied ranks, assign each tied value the average of the ranks they would have occupied and use the more general formula involving the covariance of the ranks.

Pearson’s linear correlation coefficient (r)

\[

r = \frac{\displaystyle\sum{i=1}^{n}(xi-\bar{x})(y_i-\bar{y})}

{\sqrt{\displaystyle\sum{i=1}^{n}(xi-\bar{x})^{2}\;\displaystyle\sum{i=1}^{n}(yi-\bar{y})^{2}}}

\]

  • \(\bar{x}\) and \(\bar{y}\) are the arithmetic means of the \(x\)‑ and \(y\)‑variables.
  • The numerator is the covariance; the denominator standardises it by the product of the standard deviations.

5. Assumptions, limitations and practical tips

Spearman’s ρ

  • Data can be ranked; no requirement for normality.
  • Observations must be independent.
  • Relationship must be monotonic – if the trend reverses, ρ will be low even though a relationship exists.
  • Less sensitive to outliers than Pearson, but extreme tied ranks can reduce power.

Pearson’s r

  • Both variables should be approximately normally distributed (or transformed to approximate normality).
  • Relationship must be linear; curvature reduces r even if a strong association exists.
  • Homogeneity of variance (equal spread of \(y\) across the range of \(x\)).
  • Independence of observations.
  • Highly sensitive to outliers – identify them with residual plots; consider removal or transformation.

General points

  • Correlation does not imply causation – always discuss possible confounding variables.
  • Sample size: a minimum of 5 is required for a calculable coefficient, but a larger \(n\) (≥ 10–15) gives a more reliable estimate and increases statistical power.
  • When assumptions are violated, data transformation (log, square‑root, arcsine) can often restore normality and linearity.
  • Outlier handling: (i) check measurement error, (ii) assess influence using Cook’s distance, (iii) decide whether to retain, transform, or exclude (justify in evaluation).

6. Grading the strength of a correlation (Cambridge bands)

r or ρ valueInterpretation (strength)
0.0 – 0.3 (or –0.3 – 0.0)Very weak / negligible
0.3 – 0.6 (or –0.6 – –0.3)Weak to moderate
0.6 – 0.9 (or –0.9 – –0.6)Strong
0.9 – 1.0 (or –1.0 – –0.9)Very strong (approaching perfect)

Always pair the numerical grade with a biological judgement – a “moderate” r of 0.45 may still be ecologically important if the factor studied is a known limiting resource.

7. Testing statistical significance

7.1 Pearson’s r – t‑test

\[

t = r\sqrt{\frac{n-2}{1-r^{2}}}

\]

  • Degrees of freedom (df) = \(n-2\).
  • Compare the calculated \(|t|\) with the critical value from a t‑distribution table (or use a calculator) at the chosen significance level (usually \(p=0.05\), two‑tailed).
  • If \(|t| > t_{\text{crit}}\) the correlation is statistically significant; otherwise it is not.
  • Report the p‑value (or state “p < 0.05”) in the answer.

7.2 Spearman’s ρ – critical‑value table

  • Locate the row for the sample size \(n\) and the column for the desired significance level (two‑tailed, \(p=0.05\)).
  • If \(|\rho| > \rho_{\text{crit}}\) the correlation is significant.
  • For larger samples (\(n > 30\)) the approximation \(t = \rho\sqrt{\frac{n-2}{1-\rho^{2}}}\) can be used, giving the same df as Pearson.

8. Full step‑by‑step procedure (integrating experimental design, analysis and evaluation)

  1. State a clear, testable hypothesis that links a biotic or abiotic factor to a species‑level response.


    Example: “Increasing soil nitrogen concentration will increase the abundance of Arabidopsis thaliana seedlings.”

  2. Design the investigation (AO3)

    • Identify the independent (factor) and dependent (response) variables.
    • Choose an appropriate sample size (ideally ≥ 10) and ensure randomisation of sampling units to meet the independence assumption.
    • Plan replication (minimum 5 replicates per treatment) and control for confounding variables (e.g., light, moisture).
    • Decide how data will be recorded (units, significant figures, table format).

  3. Collect paired data for each replicate or sampling site.
  4. Present raw data in a tidy table; construct a scatter plot (continuous data) or a rank plot (ordinal data). Include a best‑fit line for Pearson analyses.
  5. Inspect the plot to decide:

    • Is the relationship linear? → Pearson.
    • Is it monotonic but curved or are the data ordinal? → Spearman.
    • Are there obvious outliers? → note them for later evaluation.

  6. Check assumptions

    • Normality – use a histogram or Shapiro‑Wilk test (optional for exam).
    • Linearity (Pearson) – visual check.
    • Homogeneity of variance – look for a funnel shape in residuals.
    • If assumptions fail, apply a suitable transformation (log, √) and re‑plot.

  7. Calculate the correlation coefficient using the formulae (or a calculator/spreadsheet).
  8. Test significance with the appropriate t‑test or critical‑value table; record the calculated statistic, df, critical value and conclusion (significant/not significant).
  9. Interpret the result (AO2)

    • Direction (positive/negative) and strength (using the Cambridge grading bands).
    • Biological meaning – does the factor plausibly influence the response? Relate to known mechanisms (e.g., temperature affecting metabolic rate).
    • Discuss possible confounding variables or third‑party effects.
    • State whether the result supports or refutes the hypothesis.

  10. Evaluate the investigation (AO3)

    • Sample size – was it sufficient to detect the observed effect?
    • Measurement precision and systematic error.
    • Impact of outliers – were they justified or removed?
    • Whether all assumptions were met; if not, how could the design be improved (more replicates, controlled environment, transformation)?
    • Suggest a follow‑up experiment to test causation (e.g., manipulate temperature in growth chambers).

9. Worked examples (including outlier handling and data transformation)

Example 1 – Abiotic factor: Temperature vs. Insect abundance (continuous data)

SiteMean temperature (°C)Insect abundance (individuals)
115120
218200
321340
424410
527480
630950 (possible outlier)

  1. Scatter plot shows a linear upward trend but point 6 lies far above the line – suspect an outlier.
  2. Calculate Pearson’s r with and without the outlier:

    • Including all 6 points: \(r \approx 0.96\).
    • Excluding point 6: \(r \approx 0.99\).

  3. Significance (n = 6, df = 4):


    \(t = 0.96\sqrt{\frac{4}{1-0.96^{2}}} \approx 7.1\); critical t (df = 4, p = 0.05, two‑tailed) ≈ 2.78 → significant.

  4. Evaluation: The outlier may represent a local bloom or a counting error. Discuss its influence and justify either retaining it (if biologically real) or removing it (if a measurement error).

Biological interpretation: A strong positive correlation indicates temperature is a key driver of insect abundance, but causation must be tested experimentally because food availability and habitat structure may also increase with temperature.

Example 2 – Biotic factor: Predator density vs. Prey abundance (rank data)

HabitatPredator density (individuals ha⁻¹) – rankPrey abundance (individuals ha⁻¹) – rank
A1 (high)5 (low)
B24
C33
D42
E5 (low)1 (high)

  • Differences in ranks: \(-4, -2, 0, 2, 4\); \(\sum d_i^{2}=40\).
  • \[

    \rho = 1 - \frac{6\times40}{5(5^{2}-1)} = 1 - \frac{240}{120}= -1

    \]

  • Critical ρ for \(n=5\) at \(p=0.05\) (two‑tailed) ≈ 0.90 → \(|\rho|=1\) is significant.

Interpretation: A perfect negative monotonic relationship supports the hypothesis of top‑down control, but field data alone cannot exclude other influences such as habitat complexity.

10. Data presentation & visualisation (Paper 5 requirements)

  • Tables – clear headings, units, consistent significant figures, and a concise caption.
  • Scatter plots (continuous data) – x‑axis = independent variable, y‑axis = dependent variable; include a best‑fit line (regression) and, where relevant, confidence bands.
  • Rank plots (ordinal data) – plot rank of \(x\) against rank of \(y\); a straight line with a negative slope visualises a strong inverse monotonic trend.
  • All figures must have labelled axes, a legend (if multiple series), and a brief caption linking the visual to the biological question.

11. Linking correlation to biodiversity assessment and conservation

  • Correlate species‑richness (S) or Shannon index (H′) with gradients such as altitude, soil pH, or disturbance level to identify key drivers of biodiversity.
  • Use Spearman’s ρ when biodiversity indices are calculated from ordinal habitat quality scores.
  • Combine correlation results with IUCN threat categories (e.g., correlate habitat fragmentation index with the proportion of threatened species in a region) to inform conservation priorities.
  • Remember that conservation decisions require more than statistical association – they need mechanistic understanding and, where possible, experimental validation.

12. Causation vs. correlation – essential for AO2 evaluation

When a significant correlation is found, always ask:

  1. Is there a plausible biological mechanism linking the two variables?
  2. Could a third variable be causing both (confounding factor)?
  3. Does the direction of the relationship match known theory?
  4. Would a controlled experiment (manipulating the independent variable) be feasible to test causality?

Explicitly state that “correlation does not prove causation” and outline a possible experimental follow‑up.

13. Summary checklist for exam answers

  • Identify the type of data → choose Pearson or Spearman.
  • State all assumptions and note any violations (outliers, non‑linearity, non‑normality).
  • Show calculations clearly (r or ρ, t‑value or critical ρ, df, p‑value).
  • Interpret the sign, magnitude (Cambridge bands) and statistical significance.
  • Provide a biological explanation, discuss alternative hypotheses, and evaluate the investigation (sample size, reliability, sources of error).
  • Link the result to broader concepts: biodiversity indices, ecosystem function, or conservation status.

14. Quick reference box (for revision)

Pearson’s r – continuous, linear, normality required.

Spearman’s ρ – ordinal or non‑linear monotonic, no normality needed.

t‑test for r: \(t = r\sqrt{(n-2)/(1-r^{2})}\), df = n‑2.

Critical ρ – use table; for n > 30 approximate with the same t‑test.

Strength bands – 0‑0.3 (very weak), 0.3‑0.6 (weak‑moderate), 0.6‑0.9 (strong), > 0.9 (very strong).

Remember: correlation ≠ causation; always evaluate assumptions, outliers, and possible confounders.