Cambridge Syllabus Notes

16.2 Translation Software

Objective

Show understanding of the various stages in the compilation of a program and the key artefacts produced at each stage.

What is Translation Software?

Translation software converts source code written in a high‑level language into a form that a computer can execute. The most common type is a compiler, which performs a series of well‑defined transformations, often preceded by a pre‑processor (e.g. macro expansion in C/C++).

Compilation Pipeline – Overview

The compilation process is a pipeline. Each stage receives an input, applies a specific transformation, and produces an output that becomes the input for the next stage.

Typical flow of compilation stages (source → executable)

Detailed Stages

Pre‑processor (optional)
- Input: Raw source code.
- Typical tasks: macro expansion, file inclusion (#include), conditional compilation (#ifdef).
- Output: Modified source code passed to the lexical analyser.
Lexical Analysis (Scanning)
- Input: Characters of the (pre‑processed) source code.
- Output: Stream of tokens (identifiers, keywords, literals, operators, delimiters).
- Implementation basis: regular expressions & deterministic finite automata (DFA).
- Key data structure: Symbol table – records each identifier with attributes such as type, scope, and later storage location.
Syntax Analysis (Parsing)
- Input: Token stream.
- Output: Parse tree or Abstract Syntax Tree (AST) representing the grammatical structure.
- Technique: Context‑free grammar (usually written in BNF/EBNF).
- Example BNF rule for arithmetic expressions:
```
<expr> ::= <term> { (“+” | “-”) <term> }
```
Semantic Analysis
- Input: AST.
- Output: Annotated AST enriched with type information, scope links and other semantic attributes.
- Checks performed:
  - Type compatibility (e.g. cannot assign a float to an int without conversion).
  - Declaration before use.
  - Scope rules (e.g. variable shadowing).
  - Array‑bounds and function‑argument matching.
Intermediate Code Generation
- Input: Annotated AST.
- Output: Platform‑independent Intermediate Representation (IR), often three‑address code.
- Example three‑address instruction: t1 = a + b
Optimization
- Input: IR.
- Output: Optimised IR.
- Goals: Reduce execution time, memory usage or power consumption.
- Typical techniques:
  - Constant folding – evaluate constant expressions at compile time.
    Before: t1 = 3 + 4
    After: t1 = 7
  - Dead‑code elimination.
  - Loop‑invariant code motion.
  - Peephole optimisation on generated machine instructions.
Code Generation
- Input: Optimised IR.
- Output: Target machine code or assembly language.
- Key tasks: Register allocation, instruction selection, instruction scheduling.
Linking & Loading
- Input: Object modules produced by the code generator.
- Output: Executable program ready to run.
- Linker role: Resolve external references, combine object files.
- Loader role: Place the program in memory and start execution.
- Static vs. dynamic linking:
  - Static linking: Library code is copied into the executable at compile time.
  - Dynamic linking: References to shared libraries are resolved at run‑time, reducing executable size and allowing library updates without recompilation.

Symbol Table – Definition & Example

A symbol table is a data structure maintained by the compiler that records information about each identifier encountered in the source program.

Identifier	Type	Scope	Memory / Register
`total`	int	global	0x1004
`i`	int	for‑loop block	R5
`max`	float	function `calc`	0x200C

Typical Errors Detected at Each Stage

Stage	Typical Error Type	Example
Lexical analysis	Illegal character / malformed token	‘@’ in an identifier
Syntax analysis	Missing semicolon, mismatched parentheses	‘if (x > 0 { … }’
Semantic analysis	Type mismatch, undeclared variable, scope violation	Assigning `float` to `int` without conversion
Optimization	Transformation that changes program meaning (rare – indicates a bug in the optimiser)	Removing a side‑effecting statement
Linking	Unresolved external reference, duplicate symbol	Calling a function that is not defined in any linked library

Summary of Compilation Stages

Stage	Input	Output	Primary Purpose
Pre‑processor (optional)	Raw source code	Modified source code	Macro expansion, file inclusion, conditional compilation
Lexical analysis	Source characters	Token stream	Identify lexical units; build symbol table
Syntax analysis	Token stream	Parse tree / AST	Check grammatical structure using a BNF/EBNF grammar
Semantic analysis	AST	Annotated AST	Enforce language semantics (type, scope, declaration rules)
Intermediate code generation	Annotated AST	IR (e.g., three‑address code)	Provide a machine‑independent representation
Optimization	IR	Optimised IR	Improve execution time, memory usage, or power consumption
Code generation	Optimised IR	Target machine code / assembly	Translate to executable instructions; allocate registers
Linking & loading	Object modules	Executable program	Resolve external references (static/dynamic) and place program in memory

Key Points to Remember

Each stage produces an intermediate form that becomes the input for the next stage.
Early error detection (lexical → syntax) is easier to locate and fix than later errors (semantic, linking).
Optimisation can be applied at source, intermediate, or machine‑code level; the most exam‑focused techniques are constant folding and dead‑code elimination.
The front‑end (lexical, syntax, semantic) is language‑specific; the back‑end (IR, optimisation, code generation) is target‑machine specific – a design that promotes reuse.
Understanding the role of the symbol table, grammar rules and IR examples is essential for answering typical exam questions.

Typical Exam Question

Question: Explain the role of the symbol table during compilation and indicate at which stages it is accessed.

Answer outline:

Created during lexical analysis when an identifier is first encountered; an entry records its name, type, scope and (later) storage location.
Consulted in syntax analysis to verify that identifiers used in expressions have been declared.
Used extensively in semantic analysis for type checking, scope resolution and ensuring correct use of functions/arrays.
May be referenced again in code generation to allocate registers or memory addresses for each identifier.

Action‑Oriented Review Checklist – How Your Lecture Notes Stack Up Against the 2026 Cambridge AS & A‑Level Computer Science (9618) Syllabus

Use this quick‑audit table to verify that your notes cover every required sub‑topic. Tick the “✔” column as you confirm coverage; if a box is empty, add the suggested material.

Syllabus Section	Covered? (✔)	Suggested Action if Gap Exists
1 Information Representation • Binary, BCD, ASCII/Unicode • Two’s‑/one’s‑complement, overflow		Add a “Binary arithmetic & overflow” box with a conversion table (binary ↔ decimal ↔ hexadecimal) and a short example of two’s‑complement subtraction.
1.2 Multimedia • Bitmap vs. vector graphics • Colour depth, resolution		Insert labelled diagrams of a bitmap image and a vector image; include a quick “size‑estimate” calculation (e.g., 800 × 600 px, 24‑bit colour).
1.3 Compression • Lossy vs. lossless • Run‑Length Encoding (RLE)		Provide a one‑page RLE walk‑through (e.g., compressing the string “AAAAABBB”).
2 Communication • LAN/WAN, client‑server vs. peer‑to‑peer • Topologies, cloud, wired vs. wireless • Ethernet/CSMA‑CD, IP addressing, DNS, URLs		Create a “Network cheat‑sheet” with topology icons, IPv4/IPv6 format, subnet mask basics and a step‑by‑step DNS lookup illustration.
3 Hardware • CPU, RAM/ROM, SRAM/DRAM • PROM/EPROM/EEPROM, buffers • Embedded systems, sensors/actuators		Add a comparison table of RAM types (SRAM vs. DRAM) with typical use‑cases; include a short case‑study of an embedded sensor system (e.g., temperature sensor → microcontroller → actuator).
4 Processor Fundamentals • Von Neumann model, registers, ALU, CU, buses • Fetch‑Execute cycle, interrupts • Performance factors (clock speed, CPI, parallelism)		Insert a labelled datapath diagram and a step‑by‑step Fetch‑Execute cycle flowchart; summarise performance factors in a bullet list.
5 System Software • OS purposes, utility software • Program libraries, language translators, IDE features		Provide a side‑by‑side comparison of a compiler vs. an interpreter (pros/cons) and list typical IDE features (syntax highlighting, debugger, build tools).
6 Security & Data Integrity • Security, privacy, integrity • Threats, firewalls, encryption • Validation & verification techniques		Add a “Threat‑mitigation matrix” (e.g., phishing → strong passwords + anti‑phishing filter) and a brief note on hash functions for data integrity.
7 Ethics & Ownership • Professional ethics, copyright • Software licences, AI impact		Insert an “Ethics decision tree” for a scenario such as re‑using open‑source code in a commercial product.
8 Databases • Limitations of file‑based storage • Relational model, ER diagrams, normalisation • DBMS features, DDL/DML, basic SQL		Provide a mini‑ER‑diagram exercise (e.g., Students‑Courses relationship) and a 3‑NF normalisation worksheet with a sample table.
9 Algorithm Design • Abstraction, decomposition • Pseudocode, flowcharts, stepwise refinement • Basic search (linear, binary) and sort (bubble, selection)		Add a pseudocode template with placeholders for input, processing, and output; include a flowchart for binary search and a table comparing search/sort complexities.
Compilation (covered above)	✔	–

When you have a tick in every “✔” column, your notes are fully aligned with the 2026 Cambridge AS & A‑Level Computer Science syllabus.

Show understanding of the various stages in the compilation of a program

16.2 Translation Software