Sequence Databases and Genetic Technology (Cambridge A‑Level)
Learning Objectives
- Explain why nucleotide‑ and protein‑sequence databases are essential tools in modern biology.
- Describe how sequence data are generated (sequencing, PCR, cloning, gene‑editing).
- Identify the main types of vectors, selection markers and delivery methods used in recombinant DNA work.
- Discuss the use of DNA fingerprinting in forensics and paternity testing.
- Evaluate the advantages and limitations of the major genetic‑engineering techniques, including RNA‑i and antisense.
- Apply online tools (BLAST, Primer‑3, UniProt, PDB) to retrieve, align and interpret sequences – AO2 skills.
- Consider the ethical, social and biosafety implications of genetic technology.
Why Use Sequence Databases?
Digital repositories store curated nucleotide and protein information that can be accessed instantly, compared across species and linked to functional, phenotypic and clinical data. This accelerates research, underpins diagnostics, and provides authentic data for classroom investigations.
Benefits of Nucleotide‑Sequence Databases
- Rapid gene identification – motif or similarity searches (e.g., BLAST) locate genes of interest.
- Comparative genomics – whole‑genome data reveal orthologous regions, evolutionary relationships and conserved regulatory elements.
- Primer, probe and guide‑RNA design – exact sequences enable specific PCR primers, qPCR probes, CRISPR gRNAs and diagnostic assays.
- Mutation and variant detection – SNP, indel and disease‑associated variant catalogs support diagnostic testing and personalised medicine.
- Integration with expression & phenotype data – links to RNA‑seq, microarray, clinical and population‑frequency databases.
- Quality control – curated entries include validation scores, version history and literature references.
Benefits of Amino‑Acid Sequence & Protein‑Structure Databases
- Functional inference – conserved residues, domains and motifs predict enzyme activity, binding sites or sub‑cellular localisation.
- Structure‑based drug design – 3‑D coordinates allow docking simulations and rational inhibitor design.
- Protein engineering – knowledge of secondary‑structure elements guides mutagenesis to improve stability or activity.
- Evolutionary studies – protein family alignments trace functional divergence.
- Pathway reconstruction – interaction and structural data help map metabolic and signalling networks.
- Educational value – visualising folding, active sites and domain architecture reinforces biochemistry concepts.
Key Public Databases (Cambridge‑approved)
- GenBank / ENA / DDBJ – raw nucleotide sequences and basic annotation.
- RefSeq – curated reference genomes, transcripts and proteins.
- Ensembl – integrated genome browser with comparative tools.
- UniProtKB – comprehensive protein sequences, functional annotation and cross‑references.
- PDB (Protein Data Bank) – experimentally determined 3‑D structures.
How Sequence Data Are Produced
1. DNA Sequencing Technologies
- Sanger (chain‑termination) sequencing – gold‑standard for short fragments; used for confirming cloned inserts.
- Next‑generation sequencing (NGS) – Illumina, Ion Torrent, PacBio, Oxford Nanopore; generate whole‑genome, transcriptome or targeted‑panel data rapidly and cheaply.
- Quality control – Phred scores, coverage depth, read trimming and assembly pipelines feed validated sequences into public databases.
2. PCR, Cloning and Vector Diversity
- Polymerase Chain Reaction (PCR) – amplifies specific DNA fragments for sequencing, cloning or diagnostics.
- Vector types – see table below.
- Selection markers & reporter genes – antibiotic resistance (ampicillin, kanamycin), herbicide resistance (BAR), fluorescent proteins (GFP, mCherry), enzymatic reporters (lacZ, luciferase).
| Vector Class | Typical Size Capacity | Common Uses | Key Features |
|---|
| Plasmid (e.g., pUC19, pBR322) | ≤15 kb | Cloning of single genes, expression in bacteria. | High copy number, multiple cloning site, selectable marker. |
| Bacterial Artificial Chromosome (BAC) | 100–300 kb | Large‑fragment cloning, genome walking, transgenic animal production. | Low copy number, stable maintenance, compatible with E. coli. |
| Yeast Artificial Chromosome (YAC) | ≈1 Mb | Cloning very large genomic regions, functional studies in yeast. | Contains yeast centromere, telomeres, selectable markers. |
| Viral vectors (adenovirus, lentivirus, AAV) | 5–30 kb (depends on virus) | Gene therapy, stable transduction of mammalian cells. | High transduction efficiency, can be replication‑deficient, tissue‑specific promoters. |
3. Delivery (Transformation / Transfection) Methods
- Heat‑shock – calcium‑chloride‑treated competent bacteria; rapid, inexpensive.
- Electroporation – high‑voltage pulse creates pores in bacterial, yeast or mammalian cell membranes; works for large plasmids and low‑efficiency cells.
- Lipofection (lipid‑based reagents) – forms lipoplexes that fuse with mammalian cell membranes; widely used for cultured cells.
- Microinjection – direct injection of DNA or CRISPR components into embryos or oocytes; essential for generating transgenic animals.
- Agrobacterium‑mediated transformation – transfers T‑DNA into plant cells; basis of most genetically modified crops.
4. Gene‑Editing (CRISPR/Cas9)
- Guide RNA (gRNA) design uses target genome sequences from databases; off‑target analysis performed with BLAST or specialised tools.
- Cas9 creates a double‑strand break; repair by non‑homologous end joining (NHEJ) or homology‑directed repair (HDR) introduces mutations or inserts.
- Delivery of the CRISPR components follows the methods listed above (plasmid, ribonucleoprotein, viral vector).
Recombinant DNA – Step‑by‑Step Workflow
- Isolation of DNA – extract genomic or plasmid DNA from the source organism.
- Restriction digestion – use endonucleases (e.g., EcoRI, HindIII) to generate compatible sticky or blunt ends.
- Vector preparation – linearise the plasmid with the same enzymes; de‑phosphorylate to prevent self‑ligation.
- Ligation – DNA ligase joins insert and vector, creating the recombinant plasmid.
- Delivery into host cells – apply an appropriate transformation or transfection method (heat‑shock, electroporation, lipofection, Agrobacterium, microinjection).
- Selection – plate on medium containing the appropriate antibiotic or apply a reporter assay to retain only cells that have taken up the vector.
- Screening – colony PCR, restriction analysis or sequencing to confirm insert orientation and integrity.
Gene‑Therapy & Regulatory Landscape
- Approved therapies (as of 2025) – e.g., Luxturna (RPE65), Zolgensma (SMN1), Kymriah (CAR‑T), and several antisense oligonucleotides (Spinraza, Vyondys 53).
- Regulatory bodies – US FDA (Center for Biologics Evaluation and Research), EMA (European Medicines Agency), MHRA (UK). They assess safety, efficacy, vector design, manufacturing consistency and post‑marketing surveillance.
- Key guidelines –
- GMP (Good Manufacturing Practice) for vector production.
- Risk‑assessment of insertional mutagenesis and immune responses.
- Ethical review for germ‑line editing (currently prohibited in most jurisdictions).
DNA Fingerprinting – A Practical Example (Forensics)
Case study: STR analysis in a paternity test
- Extract genomic DNA from the child, mother and alleged father.
- Amplify 13 standard short‑tandem‑repeat (STR) loci using multiplex PCR (primers derived from forensic databases).
- Separate PCR products by capillary electrophoresis; generate an electropherogram showing allele sizes.
- Compare the child’s alleles with the mother’s to identify the paternal contribution.
- Calculate the paternity index using population allele frequencies obtained from forensic databases (e.g., NIST STRBase).
- Interpretation: a combined probability > 99.9 % supports paternity; a low probability excludes the man.
This example illustrates why reliable, population‑wide sequence databases are essential for accurate forensic interpretation.
Advantages & Limitations of Major Genetic‑Engineering Techniques
| Technique | Key Advantages | Major Limitations / Challenges |
|---|
| PCR | Rapid amplification, high sensitivity, quantitative (qPCR) and multiplexing possible. | Requires precise primer design; susceptible to contamination; limited to known target sequences. |
| Recombinant DNA (cloning) | Enables production of recombinant proteins, functional gene studies, creation of transgenic organisms. | Time‑consuming cloning steps; vector‑host compatibility; possible insertional mutagenesis. |
| CRISPR/Cas9 | Simple guide‑RNA design, multiplex editing, high efficiency in many species. | Off‑target cleavage, delivery challenges in some cell types, ethical concerns for germ‑line editing. |
| RNA interference (RNAi) | Transient knock‑down without altering DNA; useful for functional screens. | Incomplete silencing, off‑target effects, variability between cell lines. |
| Antisense technology | Specific inhibition of mRNA translation; can be delivered as oligonucleotides. | Stability of oligos, delivery efficiency, potential immune activation. |
AO2 Skill – Exam‑Style Example (Primer Design)
Task: Design primers to amplify a 500 bp fragment of the human β‑globin gene (HBB).
- Retrieve the sequence – go to GenBank, search “HBB Homo sapiens”, download the FASTA file.
- Identify the coding region – locate the ATG start and TAA stop codons; extract the exon sequence (~1.6 kb).
- Run a BLAST search – confirm gene identity and locate conserved regions suitable for primer binding.
- Open Primer‑3 (or Primer‑BLAST) – paste the target region, set product size 400–600 bp, primer length 18–22 nt, Tm 58–62 °C, GC % 40–60 %.
- Evaluate the suggested primers:
- Melting temperature (Tm) – both primers within 2 °C of each other.
- GC‑clamp at the 3′ end for stable binding.
- No hairpins, dimers or runs of a single base.
- Specificity – run each primer through BLAST to ensure no off‑target matches.
- Record the final primer sequences and calculate the expected amplicon size for the answer sheet.
This workflow demonstrates the AO2 requirements: retrieving data, analysing with bio‑informatic tools, designing an experiment and evaluating the outcome.
Ethical, Social & Biosafety Considerations
- GM crops – examples: Bt cotton (insect resistance) and Golden Rice (β‑carotene enrichment). Benefits: higher yield, reduced pesticide use; concerns: gene flow, biodiversity, farmer dependence.
- Gene therapy – life‑saving potential for monogenic diseases; risks of insertional mutagenesis, immune reactions, and unequal access.
- Human germ‑line editing – scientific possibilities (preventing hereditary disease) versus moral arguments about “designer babies”. Most countries prohibit clinical use.
- Data privacy – storage of personal genomic information raises confidentiality and consent issues.
- Biosafety levels (BSL) – laboratory work with recombinant DNA follows BSL‑1 to BSL‑3 guidelines depending on the organism and vector; containment, waste disposal and personal protective equipment are mandatory.
Practical Classroom Activities (A‑Level)
- Retrieve the lacZ gene from GenBank, perform a multiple‑sequence alignment with orthologues from E. coli, S. typhimurium and K. pneumoniae using Clustal Omega, and infer conserved catalytic residues.
- Search UniProt for human hemoglobin β‑chain (P68871); identify the heme‑binding histidine and discuss the effect of the sickle‑cell (Glu→Val) mutation.
- Download PDB entry 1A3N (myoglobin) and visualise the α‑helical fold with PyMOL; label the proximal and distal histidines.
- Design a CRISPR guide RNA targeting exon 2 of the human CCR5 gene; use an online off‑target tool to assess specificity.
- Conduct a mock forensic STR analysis: students receive simulated electropherograms and must use allele‑frequency tables (from a public forensic database) to calculate a paternity index.
- Compare two case studies – (a) Bt cotton in India and (b) Luxturna gene‑therapy for Leber congenital amaurosis – and debate the balance between scientific benefit and societal risk.
Summary
Sequence databases are the backbone of modern genetic technology. They provide rapid, reliable access to nucleotide and protein data, enable powerful analytical tools, and support a wide range of applications—from basic research and drug design to forensic science, biotechnology and gene therapy. Mastery of how these data are generated, how to manipulate them (recombinant DNA, CRISPR) and an awareness of ethical, social and biosafety issues equips Cambridge A‑Level students with the knowledge and skills required for both the examination and future scientific study.
Suggested Diagram (for teachers)
Flowchart: DNA extraction → PCR/Sequencing → Nucleotide database → Translation → Protein database → 3‑D structure (PDB) → Functional insight (enzyme activity, drug target, evolutionary relationship).