use the chi-squared test to test the significance of differences between observed and expected results (the formula for the chi-squared test will be provided, as shown in the Mathematical requirements)

Using the χ² Test to Evaluate Phenotypic Ratios and Detect Selection

Link to the Cambridge International AS & A Level Biology (9700) syllabus

Relevant syllabus points

Topic 16 Inheritance – predict genotype‑phenotype ratios from Mendelian segregation.

Topic 17 Selection & Evolution – test hypotheses about genetic change in populations.

AO2 – “handle, apply and evaluate information” – use the χ² test to compare observed and expected ratios, interpret p‑values and discuss biological implications.

Why a statistical test is required

  • Biological data are subject to random variation (segregation of alleles, sampling error, environmental noise).
  • Observed counts rarely match theoretical ratios exactly.
  • The χ² (chi‑squared) test quantifies the probability that any discrepancy is due to chance alone, allowing us to accept or reject a genetic hypothesis.

Mathematical requirement

The χ² statistic is calculated with

\$\chi^{2}= \sum{i=1}^{k}\frac{(O{i}-E{i})^{2}}{E{i}}\$

  • Oi – observed number in class i
  • Ei – expected number in class i (based on the theoretical proportion)
  • k – total number of phenotypic (or genotypic) classes

Assumptions & limits of the χ² test

  • Observations are independent.
  • Expected frequency in each class should be ≥ 5. If any E < 5, combine adjacent classes or use an exact test (e.g., Fisher’s exact test).
  • The test is an approximation that improves with larger sample sizes (the larger N, the more reliable the χ² distribution).

Checking the assumptions – a quick checklist

  1. Calculate all expected numbers Ei. Do any fall below 5? If yes, combine classes or choose an alternative test.
  2. Confirm that each observation belongs to only one class (mutual exclusivity) and that the total sample size N is the sum of the observed counts.
  3. Ensure that the data were collected independently (e.g., each seed counted once, each individual sampled only once).

Step‑by‑step procedure (goodness‑of‑fit)

  1. State the null hypothesis (H0): the observed distribution fits the expected genetic ratio.
  2. Calculate expected numbers (Ei):

    • Determine the total sample size, N.
    • Multiply N by the theoretical proportion for each class (use exact fractions; do not round until the final χ² value).

  3. Compute the χ² value using the formula above.
  4. Determine the degrees of freedom (df):

    \$df = k - 1 - c\$

    • k = number of classes.
    • c = number of parameters estimated from the data (e.g., if you estimate a phenotypic ratio from the data, c = 1). For a simple Mendelian test, c = 0, so df = k − 1.

  5. Find the critical χ² value (or p‑value) from a χ² distribution table or calculator at the chosen significance level (α = 0.05 is standard).
  6. Make a decision:

    • If χ² ≤ χ²critical (or p > 0.05), fail to reject H0 – the data are consistent with the expected ratio.
    • If χ² > χ²critical (or p ≤ 0.05), reject H0 – the observed distribution differs significantly from the expectation.

  7. Interpret biologically (see examples below).

Understanding the p‑value

  • The p‑value is the probability of obtaining a χ² value as large as (or larger than) the one calculated, assuming H0 is true.
  • When you read a χ² table, locate the row for your df and move across to the column that matches your calculated χ²; the column heading gives the corresponding p‑value.
  • Choosing α = 0.05 means you accept a 5 % chance of a Type I error (incorrectly rejecting a true hypothesis).

Worked example 1 – Monohybrid cross (goodness‑of‑fit)

Cross: Rr × Rr (round seeds = R, wrinkled seeds = r).

Expected F₂ ratio: 3 round : 1 wrinkled (3/4 : 1/4).

PhenotypeObserved (O)Theoretical proportionExpected (E)(O‑E)² / E
Round2153/4300 × 3/4 = 225(215‑225)² / 225 = 0.44
Wrinkled851/4300 × 1/4 = 75(85‑75)² / 75 = 1.33
χ²1.78

Calculations

  1. Total seeds counted, N = 215 + 85 = 300.
  2. Expected numbers (exact, not rounded):

    • Round = 300 × 3/4 = 225
    • Wrinkled = 300 × 1/4 = 75

  3. χ² = 0.44 + 1.33 = 1.78.
  4. Number of classes, k = 2 → df = k − 1 = 1.
  5. Critical χ² for df = 1 at α = 0.05 is 3.84 (p ≈ 0.18).
  6. Since 1.78 < 3.84 (p > 0.05), we fail to reject H₀. The data support the 3 : 1 Mendelian expectation.

Biological interpretation

  • Non‑significant result (p > 0.05) – consistent with a single‑gene, two‑allele model.
  • If the result had been significant, possible explanations would include linkage, incomplete dominance, environmental influence, or experimental error.

Worked example 2 – Dihybrid cross (goodness‑of‑fit)

Cross: RrYy × RrYy (seed shape = R/r, colour = Y/y).

Expected F₂ phenotypic ratio: 9 round‑yellow : 3 round‑green : 3 wrinkled‑yellow : 1 wrinkled‑green (9/16 : 3/16 : 3/16 : 1/16).

Phenotype (shape‑colour)Observed (O)Theoretical proportionExpected (E)(O‑E)² / E
Round‑Yellow2109/16400 × 9/16 = 225(210‑225)² / 225 = 1.00
Round‑Green703/16400 × 3/16 = 75(70‑75)² / 75 = 0.33
Wrinkled‑Yellow653/16400 × 3/16 = 75(65‑75)² / 75 = 1.33
Wrinkled‑Green551/16400 × 1/16 = 25(55‑25)² / 25 = 36.00
χ²38.66

Calculations

  1. Total plants counted, N = 210 + 70 + 65 + 55 = 400.
  2. Expected numbers (exact):

    • Round‑Yellow = 400 × 9/16 = 225
    • Round‑Green = 400 × 3/16 = 75
    • Wrinkled‑Yellow = 400 × 3/16 = 75
    • Wrinkled‑Green = 400 × 1/16 = 25

  3. χ² = 1.00 + 0.33 + 1.33 + 36.00 = 38.66.
  4. k = 4 → df = k − 1 = 3.
  5. Critical χ² for df = 3 at α = 0.05 is 7.81 (p ≈ 0.00002).
  6. Since 38.66 > 7.81 (p < 0.001), we reject H₀. The observed distribution does not fit the expected 9:3:3:1 ratio.

Biological interpretation

  • A significant χ² suggests that the simple dihybrid model is inadequate. Possible reasons:

    • Linkage between the two genes (reducing recombination).
    • Gene interaction (epistasis) not accounted for in the 9:3:3:1 expectation.
    • Selection against one phenotype during growth.
    • Sampling error or violation of independence.

  • Further work could involve a test‑cross, larger sample size, or molecular markers to check for linkage.

χ² test for independence (contingency tables) – relevance to Topic 17

Beyond goodness‑of‑fit, the χ² test can examine whether two categorical variables are independent. This is useful for evolutionary questions such as “Does genotype frequency differ between two habitats?”

HabitatGenotype AAGenotype AaGenotype aaRow total
Low altitude305020100
High altitude104050100
Column total409070200

Expected counts are calculated as (row total × column total) / grand total. The χ² statistic is then summed over the six cells. A significant result (p ≤ 0.05) would indicate that genotype frequencies differ between altitudes – evidence for selection or local adaptation.

Effect of sample size and power

  • Small N (e.g., N < 20) often yields low χ² values even when the true ratio is different, leading to a Type II error (failing to detect a real effect).
  • Increasing the sample size raises the test’s power, making it easier to detect modest deviations from expectation.
  • When planning experiments, aim for at least 20–30 individuals per phenotypic class, and ensure all expected counts are ≥ 5.

Key points to remember

  • Always use the exact expected numbers; round only the final χ² (or p‑value).
  • Check the assumptions before applying the test – combine classes or switch to an exact test if any E < 5.
  • Degrees of freedom = (number of classes) − 1 − (number of parameters estimated from the data).
  • Compare χ² with the critical value (or read the p‑value) at the chosen significance level (α = 0.05 is standard).
  • Interpret the statistical outcome in the biological context: Mendelian segregation, gene linkage, epistasis, or selection in natural populations.
  • Link every activity back to the syllabus AO2 wording – students handle data, calculate χ², evaluate p‑values and discuss the genetic implications.

Suggested diagram: (a) Punnett square for the Rr × Rr monohybrid cross showing the 3 : 1 expectation; (b) 2 × 2 contingency table illustrating a test of independence between genotype and habitat.