use Spearman’s rank correlation and Pearson’s linear correlation to analyse the relationships between two variables, including how biotic and abiotic factors affect the distribution and abundance of species (the formulae for these correlations will b

Published by Patrick Mutisya · 8 days ago

Cambridge A-Level Biology 9700 – Biodiversity: Correlation Analysis

Biodiversity – Analysing Relationships with Correlation

This set of notes explains how to use Spearman’s rank correlation and Pearson’s linear correlation to investigate how biotic and abiotic factors influence the distribution and abundance of species.

1. Key Concepts

  • Biodiversity: Variety of life at genetic, species and ecosystem levels.
  • Biotic factors: Living components that affect a species (e.g., competition, predation, symbiosis).
  • Abiotic factors: Non‑living components (e.g., temperature, pH, moisture, light).
  • Distribution: Spatial pattern of a species across a habitat.
  • Abundance: Number of individuals of a species in a given area.

2. Why Use Correlation?

Correlation quantifies the strength and direction of a relationship between two quantitative variables. In ecology it helps to:

  1. Identify which environmental variables are most closely linked to species abundance.
  2. Distinguish linear from monotonic (but not necessarily linear) relationships.
  3. Provide a statistical basis for further modelling (e.g., regression).

3. Pearson’s Linear Correlation (r)

Used when both variables are measured on an interval or ratio scale and the relationship is expected to be linear.

Formula:

\$\$r = \frac{\sum{i=1}^{n}(xi-\bar{x})(y_i-\bar{y})}

{\sqrt{\sum{i=1}^{n}(xi-\bar{x})^2}\;

\sqrt{\sum{i=1}^{n}(yi-\bar{y})^2}}\$\$

where:

  • \$xi\$, \$yi\$ are individual observations.
  • \$\bar{x}\$, \$\bar{y}\$ are the means of the \$x\$ and \$y\$ data sets.
  • \$n\$ is the number of paired observations.

Interpretation of \$r\$:

  • \$r = +1\$ – perfect positive linear relationship.
  • \$r = -1\$ – perfect negative linear relationship.
  • \$r = 0\$ – no linear relationship.
  • Values between 0 and ±1 indicate the strength of the linear trend.

4. Spearman’s Rank Correlation (ρ or \$r_s\$)

Used when data are ordinal, not normally distributed, or when the relationship is monotonic but not linear.

Steps:

  1. Rank each set of observations separately (smallest = 1, largest = \$n\$).
  2. Calculate the difference \$d_i\$ between the paired ranks for each observation.
  3. Apply the formula:

\$rs = 1 - \frac{6\sum{i=1}^{n}d_i^{2}}{n(n^{2}-1)}\$

Interpretation of \$r_s\$ follows the same guidelines as Pearson’s \$r\$ but refers to monotonic trends.

5. Example Data Set

Suppose we survey a freshwater pond and record the abundance of a particular macroinvertebrate species (individuals per m²) against water temperature (°C) at ten sites.

SiteTemperature (°C)Abundance (ind/m²)
1125
2148
31512
41615
51820
61922
72025
82127
92230
102333

6. Calculating Pearson’s r for the Example

Using the data above:

  1. Compute \$\bar{x}\$ (mean temperature) and \$\bar{y}\$ (mean abundance).
  2. Calculate each \$(xi-\bar{x})(yi-\bar{y})\$, \$(xi-\bar{x})^2\$, and \$(yi-\bar{y})^2\$.
  3. Insert the sums into the Pearson formula.

For brevity, the final result is \$r \approx 0.99\$, indicating a very strong positive linear relationship between temperature and abundance.

7. Calculating Spearman’s \$r_s\$ for the Same Data

Because the data are already ordered, the ranks for temperature and abundance are identical (1–10). Thus \$d_i = 0\$ for all \$i\$ and:

\$r_s = 1 - \frac{6\cdot 0}{10(10^{2}-1)} = 1\$

This confirms a perfect monotonic increase.

8. Interpreting Results in an Ecological Context

  • Positive correlation suggests that higher temperatures favour the species, perhaps by increasing metabolic rates or food availability.
  • If a negative correlation were observed with, for example, dissolved oxygen, it would indicate that lower oxygen levels limit abundance.
  • Strong correlations do not prove causation; experimental or mechanistic studies are required to confirm the underlying processes.

9. Applying Correlation to Biotic vs. Abiotic Factors

When analysing field data, separate the variables into:

  1. Abiotic variables – temperature, pH, light intensity, moisture, nutrient concentration.
  2. Biotic variables – predator density, competitor abundance, symbiotic partner presence.

Calculate correlation coefficients for each variable against species abundance. Compare the magnitude of \$|r|\$ or \$|r_s|\$ to identify which factor has the strongest association.

10. Limitations and Precautions

  • Both coefficients assume independent observations; spatial autocorrelation can inflate significance.
  • Pearson’s \$r\$ requires approximately normal distribution of the variables; otherwise, transform data or use Spearman’s \$r_s\$.
  • Outliers can heavily influence Pearson’s \$r\$ but have less effect on Spearman’s \$r_s\$.
  • Correlation does not imply causation; consider confounding variables.

11. Suggested Diagram

Suggested diagram: Scatter plot of temperature (x‑axis) versus species abundance (y‑axis) with a fitted linear regression line, illustrating the strong positive Pearson correlation.

12. Summary Checklist for Students

  1. Identify the two variables to be compared.
  2. Decide whether the relationship is expected to be linear (use Pearson) or monotonic (use Spearman).
  3. Rank the data if using Spearman; otherwise calculate means and deviations for Pearson.
  4. Apply the appropriate formula and compute the coefficient.
  5. Interpret the sign and magnitude of the coefficient in ecological terms.
  6. Consider the role of biotic and abiotic factors and discuss possible mechanisms.