discuss how DNA sequence data can show evolutionary relationships between species

Evolution – Using DNA Sequence Data to Infer Relationships

1. Why Molecular Phylogenetics Matters (Link to the Cambridge Syllabus)

Modern biology requires students to understand how evolution is recorded at the molecular level, how this information is used to build phylogenies, and how it connects to other core topics such as inheritance, natural selection, classification, biodiversity and genetic technology. DNA‑sequence analysis therefore provides a practical bridge between the abstract concepts of evolution (Topic 18) and the hands‑on techniques of genetic technology (Topic 20).

2. Learning Outcomes and AO Mapping

Learning OutcomeAssessment Objective (AO)
Explain why DNA sequences are a reliable molecular record of evolutionary history and describe homology, analogy, genetic distance and the molecular clock.AO1
Apply a step‑by‑step method to obtain, align and analyse DNA (or protein) sequences, calculate genetic distances and construct a phylogenetic tree.AO2
Evaluate the strengths and limitations of molecular phylogenetics, design a practical investigation (including controls and error analysis) and compare molecular evidence with morphological/fossil data.AO3

3. Key Concepts (AO1)

  • DNA as a molecular record of evolutionary change. (AO1)
  • Homologous vs. analogous sequences. (AO1)
  • Sequence similarity, divergence and the molecular clock. (AO1)
  • Phylogenetic trees: topology, branch length, rooting and node support. (AO1)
  • Link to other A‑Level topics: inheritance (Topic 17), natural selection (Topic 18), classification & biodiversity (Topic 19), genetic technology (Topic 20). (AO1)

4. DNA as a Molecular Record (AO1)

DNA is composed of four nucleotides (A, T, C, G). Over evolutionary time, mutations – substitutions, insertions and deletions – accumulate. In many genes the rate of neutral mutations is approximately constant, giving a “molecular clock” that can be calibrated with fossil dates or known biogeographic events. By comparing the order of nucleotides (or the derived amino‑acid sequences) between species we can:

  • Estimate the time since two taxa shared a common ancestor.
  • Reveal relationships that are not obvious from morphology alone.
  • Quantify biodiversity and inform conservation decisions.

5. Choosing an Appropriate Genetic Marker (AO1)

Markers must be orthologous (derived from a single ancestral gene) and meet three practical criteria:

  • Present in all taxa under study.
  • Conserved enough for reliable alignment, yet variable enough to show informative differences.
  • Relevant to the genome compartment of interest (nuclear, mitochondrial or chloroplast).

MarkerTypical UseGenome Location
Cytochrome c oxidase I (COI)DNA barcoding of animalsMitochondrial
16S/12S rRNA genesPhylogeny of bacteria & vertebratesMitochondrial / Chloroplast
18S rRNA geneBroad eukaryotic relationshipsNuclear
β‑globin, Hox genesVertebrate evolution, developmental studiesNuclear

6. Step‑by‑Step Practical (AO2)

6.1 Sample Collection & DNA Extraction

  • Obtain fresh tissue (e.g., leaf, muscle) and store at –20 °C.
  • Use a commercial kit or CTAB method; include a negative control (no tissue) to detect contamination.
  • Quantify DNA with a spectrophotometer (260/280 nm ratio ≈ 1.8).

6.2 Polymerase‑Chain‑Reaction (PCR)

  • Design primers that flank the chosen marker; check for specificity with BLAST.
  • Run a gradient PCR to optimise annealing temperature (typically 55–62 °C).
  • Include a positive control (known template) and a no‑template control.

6.3 Sequencing

  • Sanger sequencing for a single gene (high accuracy, ≤1 kb).
  • Next‑generation sequencing (Illumina, Oxford Nanopore) for multi‑gene or whole‑genome projects.

6.4 Quality Check

  • Inspect chromatograms; trim low‑quality ends (Phred score < 20).
  • Confirm expected amplicon length by gel electrophoresis.

6.5 Multiple Sequence Alignment (MSA)

  • Use Clustal Ω, MUSCLE or MAFFT for automated alignment.
  • Manually inspect indels and adjust where necessary – mis‑aligned gaps inflate distance estimates.

6.6 Calculating Genetic Distances (AO2)

Two common approaches:

  1. Percentage identity – simple count of identical bases ÷ total aligned bases × 100 %.
  2. Model‑based distances – corrects for multiple substitutions.

    Example: Jukes‑Cantor (JC69) assumes equal base frequencies and equal substitution rates:

    \[

    d_{\text{JC}} = -\frac{3}{4}\ln\!\left(1-\frac{4}{3}p\right)

    \]

    where \(p\) = observed proportion of differences.

Why choose a model? Real DNA often violates JC assumptions (e.g., transition > transversion rates). For protein‑coding genes the Kimura 2‑parameter (K2P) model distinguishes transitions from transversions, giving more realistic distances. When the data set is large, the General Time Reversible model with a gamma‑distributed rate heterogeneity (GTR+Γ) is preferred because it accommodates unequal base frequencies and variable substitution rates across sites.

6.7 Tree Construction (AO2)

  • Neighbour‑Joining (NJ) – distance‑based, fast, good for introductory investigations.
  • Maximum Likelihood (ML) – uses an explicit evolutionary model; provides likelihood scores.
  • Bayesian Inference (BI) – generates posterior probabilities for each node.
  • Perform bootstrap resampling (≥1 000 replicates) or calculate posterior probabilities to assess node support.

6.8 Interpretation (AO3)

  • Examine topology: sister taxa, basal lineages, outgroup placement.
  • Branch lengths reflect genetic distance (or time, if a calibrated clock is used).
  • Compare the molecular tree with morphological/fossil evidence; discuss any discrepancies.

7. Connecting Molecular Phylogenetics to the Wider A‑Level Syllabus (AO3)

Syllabus Unit (Topic)Relevance to DNA‑Sequence Phylogeny
17 – InheritanceOrthologous genes retain a common ancestry; concepts of alleles, linkage and recombination explain why gene copies are comparable across species.
18 – Selection & EvolutionThe molecular clock relies on neutral mutations; natural selection can accelerate or decelerate rates in functional regions, linking directly to the clock’s limitations.
19 – Classification, Biodiversity & ConservationPhylogenetic trees provide the framework for modern cladistic classification and help identify cryptic species for conservation planning.
20 – Genetic TechnologyPCR, Sanger sequencing, NGS and DNA barcoding are the practical tools that generate the data used in molecular phylogenetics. Emerging techniques such as CRISPR can be used to edit marker genes for functional studies.
Practical Skills (AO3)Designing a DNA‑based investigation, selecting appropriate controls, analysing distance matrices, performing bootstrapping and evaluating methodological limitations all satisfy AO3 requirements.

8. Potential Pitfalls & Sources of Error (AO3)

  • Horizontal gene transfer (HGT) – common in prokaryotes; can produce gene trees that conflict with species histories.
  • Paralogy vs. orthology – comparing duplicated genes (paralogs) may give false sister‑taxa relationships.
  • Rate heterogeneity – some lineages evolve faster; use models with a gamma distribution or partitioned analyses.
  • Incomplete lineage sorting (ILS) – ancestral polymorphisms persist across speciation events, leading to discordance between gene trees and the true species tree.
  • Alignment errors – incorrect placement of indels inflates distance estimates; always verify alignments manually.
  • Sampling bias – too few taxa or an inappropriate outgroup can mis‑root the tree.

9. Gene Trees vs. Species Trees (AO3)

A *gene tree* reflects the evolutionary history of a single locus, whereas a *species tree* represents the history of the organisms themselves. Discordance can arise from HGT, paralogy, ILS or hybridisation. Modern studies therefore often combine several unlinked markers (multi‑locus or genome‑wide data) and employ coalescent‑based methods (e.g., *ASTRAL*, *StarBEAST*) to infer a robust species tree.

10. Limits of the Molecular Clock (AO3)

  • Assumes a constant substitution rate – in reality rates vary among genes, lineages and over time.
  • Calibration is required (fossil dates, biogeographic events); inaccurate calibrations produce erroneous divergence times.
  • Selection can either speed up (positive selection) or slow down (purifying selection) apparent rates.

11. Case Study – Molecular Re‑classification of Whales (AO3)

Traditional morphology placed whales (order Cetacea) separate from terrestrial mammals. Multi‑gene analyses (mitochondrial COI, nuclear β‑globin, Hox) consistently grouped whales with hippopotamids within the order Artiodactyla, leading to the recognised clade *Whippomorpha*. This example demonstrates how DNA data can overturn long‑standing taxonomic views and highlights the importance of integrating molecular and morphological evidence.

12. Example Data Set and Phylogenetic Tree (AO2)

SpeciesCOI (5'‑3')Length (bp)
Human (Homo sapiens)ATGCCGTAGC…658
Chimpanzee (Pan troglodytes)ATGCCGTAGT…658
Mouse (Mus musculus)ATGTCGTAGC…658
Frog (Rana temporaria)ATGACGTAGC…658

After alignment the % identities are:

  • Human‑Chimpanzee: 99.7 % (2 differences)
  • Human‑Mouse: 98.2 % (12 differences)
  • Human‑Frog: 95.4 % (30 differences)

Using the K2P model these translate into the distance matrix shown below, which was used by a neighbour‑joining algorithm (1 000 bootstrap replicates) to produce the tree illustrated.

Neighbour‑joining tree: Human and Chimpanzee sister taxa, Mouse basal to them, Frog as outgroup. Bootstrap values >70 % shown.

Rooted neighbour‑joining tree based on COI K2P distances. Bootstrap values (>70 %) are indicated at each node.

13. Summary Checklist (AO2)

  • Select an orthologous marker that balances conservation and variability.
  • Extract high‑quality DNA; include negative and positive controls.
  • Amplify the target region with well‑designed primers; optimise PCR conditions.
  • Obtain accurate sequences (Sanger or NGS) and trim low‑quality ends.
  • Perform a reliable multiple sequence alignment; manually check indels.
  • Choose an appropriate evolutionary model (e.g., K2P, GTR+Γ) and justify the choice.
  • Construct phylogenetic trees using at least two complementary methods (NJ + ML or BI).
  • Assess node support with bootstrap replicates or posterior probabilities.
  • Interpret the topology, branch lengths and support values; compare with morphological/fossil evidence.
  • Discuss limitations (rate heterogeneity, HGT, paralogy) and suggest further work.

14. Exam‑Style Question (AO1‑AO3)

Q: Explain how the percentage identity of a mitochondrial gene can be used to infer the evolutionary relationship between two species. Include a brief description of how a phylogenetic tree would be derived from the data, and evaluate one limitation of this approach.

Answer Outline

  1. Higher % identity indicates fewer accumulated mutations, implying a more recent common ancestor. (AO1)
  2. Align the two sequences, count identical bases, calculate % identity = (identical ÷ total) × 100. (AO2)
  3. Convert % identity to a genetic distance using a model (e.g., K2P) to correct for multiple substitutions. (AO2)
  4. Repeat for all taxa, create a distance matrix, and input the matrix into a tree‑building algorithm such as neighbour‑joining. (AO2)
  5. Bootstrap the analysis (≥1 000 replicates) to obtain support values for each node. (AO3)
  6. Interpret the resulting topology: sister taxa, basal lineages, relative divergence times. (AO3)
  7. Limitation – the molecular clock may not be constant; rate heterogeneity can cause over‑ or under‑estimation of divergence times. Complementary morphological or fossil data should be consulted. (AO3)

15. Quick‑Scan of Notes vs. Cambridge 9700 Syllabus (AO1‑AO3)

Syllabus RequirementCoverage in the NotesGap / Improvement Needed
Full topic list (AS 1‑11 + A 12‑20)Only the Evolution – DNA‑sequence sub‑topic is detailed.Insert “link‑in” boxes that connect the phylogeny material to DNA replication, transcription, translation (Topic 6), natural selection, speciation, classification and biodiversity (Topic 18‑19), and to Genetic Technology (Topic 20).
AO mappingLearning outcomes listed, but no explicit AO tags.Added AO‑mapping table and AO tags on key points.
Depth & AccuracyDetailed methodology present.Added justification for model choice, limits of the molecular clock, and a note on gene vs species trees.
Missing key conceptsDNA replication, transcription, translation, natural selection, speciation, classification, biodiversity, CRISPR not mentioned.Provided concise “Why molecular phylogenetics matters” paragraph and a syllabus‑link table; referenced CRISPR as a modern genetic‑technology example.
Clarity & StructureGood headings but occasional flow issues.Introduced a clear introductory paragraph, consistent sub‑heading numbering (6.1‑6.8), and transitional sentences.

16. Further Reading (Suggested)

  • Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics – chapters on distance methods and tree construction.
  • Cambridge International AS & A Level Biology (9700) – Sections 18 (Selection & Evolution) and 20 (Genetic Technology).
  • Hebert, P. D. N. et al. “DNA Barcoding: A Simple Tool for Species Identification.” Proc. Royal Soc. B, 2003.
  • Felsenstein, J. “Confidence Limits on Phylogenies: An Approach Using the Bootstrap.” Evolution, 1985.
  • Hillis, D. M., & Bull, J. J. “An Empirical Test of the Molecular Clock.” Mol. Biol. Evol., 1993.