Statistical Analysis of Biological Variation (Optional Supplement)
Learning Objectives (with Assessment Objectives)
- AO1 – Knowledge: understand the biological reasons for measuring variation and the statistical concepts behind the t‑test.
- AO2 – Data handling: select, organise, analyse and evaluate data using the independent‑samples t‑test (or Welch’s t‑test) and report results correctly.
- AO3 – Experimentation: plan and carry out a simple experiment, check assumptions, and evaluate the reliability and limitations of the statistical conclusions.
Why Study Variation in Biology?
- Variation is the raw material for natural selection (Topics 17 & 18).
- Differences in genotype, phenotype, disease susceptibility or treatment response are quantified using statistical tools.
- Statistical tests tell us whether an observed difference is likely to be a real biological effect or just random chance.
Link to the Cambridge 9700 Syllabus
| Syllabus Area | Requirement | How the Supplement Meets It |
|---|
| Mathematical requirements (Statistical tools) | Use appropriate statistical tests; interpret p‑values; recognise limitations. | Independent‑samples t‑test, Welch’s t‑test, effect‑size (Cohen’s d), checklist of assumptions, non‑parametric alternative (Mann‑Whitney U), brief overview of chi‑square, ANOVA and regression. |
| Topic 10 – Infectious diseases | Explain how variation in pathogen load influences disease outcome. | Example: compare mean bacterial counts in two treatment groups using a t‑test. |
| Topic 12 – Energy & respiration | Analyse experimental data on enzyme activity or metabolic rate. | Worked example on enzyme activity (see below) links directly to this topic. |
| Topic 17 & 18 – Selection & evolution | Use statistical evidence to support arguments about evolutionary change. | Classroom activity on seed‑germination time demonstrates how a t‑test can test hypotheses about adaptive traits. |
| Data handling (AO2) | Select, organise, present, evaluate and interpret data. | Step‑by‑step procedure, data‑visualisation templates (box‑plot, dot‑plot), and a summary table of what students should know, do and evaluate. |
Statistical Foundations (AO1)
p‑value: the probability of obtaining a test statistic at least as extreme as the observed value, assuming the null hypothesis is true. A small p‑value (≤ α, usually 0.05) leads to rejection of H₀.
Type I & Type II errors:
- Type I – false positive (rejecting a true H₀). Risk = α.
- Type II – false negative (failing to reject a false H₀). Risk = β; power = 1 – β.
Confidence interval (CI): a range of values that is likely to contain the true population mean difference with a given confidence (usually 95 %). Reporting a CI alongside a p‑value gives information about the magnitude and precision of the effect.
Effect size (Cohen’s d): quantifies the biological importance of a difference.
\[
d = \frac{\bar{x}1-\bar{x}2}{s_{\text{pooled}}}
\]
where \(s{\text{pooled}}=\sqrt{sp^{2}}\). Interpretation: small (≈0.2), medium (≈0.5), large (≥0.8).
Multiple‑testing correction: when several hypotheses are tested on the same data set, the chance of a Type I error rises. Simple methods such as the Bonferroni correction (divide α by the number of tests) can be mentioned.
Non‑parametric alternative: if normality or homogeneity of variance cannot be satisfied, the Mann‑Whitney U test (also called Wilcoxon rank‑sum) may be used.
Data visualisation: box‑plots or dot‑plots give a quick visual check of normality, spread and outliers before running a t‑test.
When to Use the Independent‑Samples t‑Test (AO2)
- Two independent groups (e.g., two genotypes, two treatments, two populations).
- The response variable is continuous and approximately normally distributed in each group.
- Variances are equal (homoscedastic). If not, use Welch’s t‑test.
- Sample sizes are moderate (≥ 5 per group) – the test is robust when assumptions are met.
Statistical Checklist (Before Running the Test) – AO2
- Independence: Ensure observations in one group do not influence the other.
- Normality: Examine histograms, Q‑Q plots, or perform Shapiro‑Wilk/Kolmogorov‑Smirnov tests.
- Equality of variances: Conduct Levene’s test or an F‑test.
- Choose the test:
- Equal variances → Student’s (pooled) t‑test.
- Unequal variances → Welch’s t‑test.
- Severe non‑normality → Mann‑Whitney U.
Statistical Toolbox Overview (AO1)
- Student’s (pooled) independent‑samples t‑test – equal variances.
- Welch’s t‑test – unequal variances.
- Chi‑square test – association between categorical variables.
- One‑way ANOVA – compare >2 group means.
- Simple linear regression – relationship between two continuous variables.
- Mann‑Whitney U test – non‑parametric alternative for two independent samples.
Formulae (AO1)
Student’s (pooled) independent‑samples t‑test
\[
t = \frac{\bar{x}1 - \bar{x}2}{\sqrt{sp^{2}\!\left(\frac{1}{n1}+\frac{1}{n_2}\right)}}
\]
\[
sp^{2}= \frac{(n1-1)s1^{2}+(n2-1)s2^{2}}{n1+n_2-2}
\]
Welch’s t‑test (unequal variances)
\[
t = \frac{\bar{x}1 - \bar{x}2}{\sqrt{\frac{s1^{2}}{n1}+\frac{s2^{2}}{n2}}}
\]
\[
df = \frac{\left(\frac{s1^{2}}{n1}+\frac{s2^{2}}{n2}\right)^{2}}
{\frac{(s1^{2}/n1)^{2}}{n1-1}+\frac{(s2^{2}/n2)^{2}}{n2-1}}
\]
Effect size (Cohen’s d)
\[
d = \frac{\bar{x}1-\bar{x}2}{\sqrt{s_p^{2}}}
\]
Step‑by‑Step Procedure (Student’s t‑test) – AO2 & AO3
- State hypotheses:
\(H{0}:\mu{1}=\mu_{2}\) (no difference)
\(H{A}:\mu{1}\neq\mu_{2}\) (two‑tailed) or \(>\) / \(<\) (one‑tailed).
- Collect data and enter into a spreadsheet.
- Calculate descriptive statistics – means \(\bar{x}1,\bar{x}2\) and variances \(s1^{2},s2^{2}\).
- Check assumptions using the checklist. If variances differ, repeat steps 5‑9 with Welch’s formula.
- Compute pooled variance \(s_p^{2}\) (or skip this for Welch’s).
- Calculate the t‑value** using the appropriate formula.
- Determine degrees of freedom:
Student’s: \(df = n1+n2-2\)
Welch’s: use the approximation formula above.
- Obtain the critical t (or p‑value) from a t‑distribution table or software for the chosen \(\alpha\) (usually 0.05) and the appropriate tail(s).
- Decision:
• If \(|t| > t{\text{crit}}\) (or \(p \le \alpha\)) → reject \(H{0}\).
• Otherwise, fail to reject \(H_{0}\).
- Report the result in the standard format, e.g.
t(10) = -3.42, p = 0.008, d = 1.2.
- Interpretation (AO3): translate the statistical conclusion into a biological statement, discuss effect size, confidence interval and any limitations.
Worked Biological Example (AO1–AO3)
Biological context (Topic 12 – Enzymes): A mutant allele is suspected to increase the activity of a key metabolic enzyme.
| Wild‑type (n = 6) | Mutant (n = 6) |
|---|
| 12.1 | 15.4 |
| 11.8 | 14.9 |
| 12.5 | 15.1 |
| 12.0 | 14.7 |
| 11.9 | 15.3 |
| 12.3 | 15.0 |
- \(n1=n2=6\)
- \(\bar{x}1 = 12.1\) U mg⁻¹, \(\bar{x}2 = 15.1\) U mg⁻¹
- \(s1^{2}=0.07\), \(s2^{2}=0.09\)
- Levene’s test: p = 0.62 → equal variances → use Student’s t‑test.
- Pooled variance: \(s_p^{2}= \frac{5(0.07)+5(0.09)}{10}=0.08\)
- \[
t = \frac{12.1-15.1}{\sqrt{0.08\left(\frac{1}{6}+\frac{1}{6}\right)}}=
\frac{-3.0}{\sqrt{0.08\times0.333}}=
\frac{-3.0}{0.163}= -18.4
\]
- Degrees of freedom: \(df = 6+6-2 = 10\)
- Critical t (two‑tailed, α = 0.05, df = 10) = 2.228.
- \(|t| = 18.4 > 2.228\) → reject \(H_{0}\).
- Effect size: \(d = \dfrac{-3.0}{\sqrt{0.08}} = -10.6\) (very large).
- 95 % CI for the mean difference (using software) ≈ –3.2 to –2.8 U mg⁻¹.
Result reporting: t(10) = -18.4, p < 0.001, d = -10.6, 95 % CI = [-3.2, -2.8]
Interpretation & Evaluation (AO3)
- Biological meaning: The mutant allele produces a markedly higher enzyme activity; the effect is both statistically significant and biologically large.
- Assumption check: Shapiro‑Wilk p > 0.2 for both groups (normality); Levene’s test p = 0.62 (equal variances). Sample size meets the minimum requirement for a t‑test.
- Limitations:
- Only six replicates per group – limited power to detect smaller effects.
- Experiment performed under a single set of growth conditions; results may not generalise to other environments.
- Potential hidden confounders (e.g., slight differences in protein extraction efficiency).
- Further work: increase n, test additional mutant alleles, perform a dose‑response assay, or use ANOVA to compare more than two genotypes.
Suggested Classroom Activity (AO3)
Objective: Test whether two seed genotypes differ in mean germination time.
- Formulate a hypothesis (e.g., “Genotype A germinates faster than Genotype B”).
- Plant at least 8 seeds of each genotype under identical conditions.
- Record the time (hours) each seed takes to germinate.
- Enter data into a spreadsheet; produce a box‑plot to visualise spread and check normality.
- Run the appropriate t‑test (Student’s or Welch’s) or Mann‑Whitney U if assumptions fail.
- Report the result in the format
t(df) = value, p = …, d = … and discuss the biological implication.
Key Take‑aways (AO1 & AO2)
- The independent‑samples t‑test (or Welch’s version) is the standard method for comparing two group means in biological research.
- Checking assumptions (independence, normality, equal variances) is essential before interpreting the test.
- Report the test statistic, degrees of freedom, p‑value, effect size and, where possible, a confidence interval.
- Always link the statistical conclusion back to the underlying biological question and evaluate the reliability of the result.
Summary Table – What Students Should Be Able to Do
| AO | What to Know (AO1) | What to Do (AO2) | What to Evaluate (AO3) |
|---|
| AO1 | Concept of variation, normal distribution, p‑value, Type I/II errors, effect size, when to use parametric vs non‑parametric tests. | – | – |
| AO2 | – | Select the correct test, calculate means, variances, t‑value, df, p‑value and effect size; produce a box‑plot. | – |
| AO3 | – | – | Assess whether assumptions are met, discuss biological relevance of the effect size, comment on sample size, possible biases and suggestions for further investigation. |
Suggested Flowchart (to be drawn on board or slide)
Hypothesis → Data collection → Visual check (box‑plot) → Test assumptions (normality, equal variance) → Choose Student’s, Welch’s or Mann‑Whitney U → Calculate test statistic & df → Obtain p‑value (or CI) → Decision (reject/fail to reject H₀) → Biological interpretation & evaluation.