Show understanding of the various stages in the compilation of a program

16.2 Translation Software

Objective

Show understanding of the various stages in the compilation of a program and the key artefacts produced at each stage.

What is Translation Software?

Translation software converts source code written in a high‑level language into a form that a computer can execute. The most common type is a compiler, which performs a series of well‑defined transformations, often preceded by a pre‑processor (e.g. macro expansion in C/C++).

Compilation Pipeline – Overview

The compilation process is a pipeline. Each stage receives an input, applies a specific transformation, and produces an output that becomes the input for the next stage.

Typical flow of compilation stages (source → executable)

Detailed Stages

  1. Pre‑processor (optional)

    • Input: Raw source code.
    • Typical tasks: macro expansion, file inclusion (#include), conditional compilation (#ifdef).
    • Output: Modified source code passed to the lexical analyser.

  2. Lexical Analysis (Scanning)

    • Input: Characters of the (pre‑processed) source code.
    • Output: Stream of tokens (identifiers, keywords, literals, operators, delimiters).
    • Implementation basis: regular expressions & deterministic finite automata (DFA).
    • Key data structure: Symbol table – records each identifier with attributes such as type, scope, and later storage location.

  3. Syntax Analysis (Parsing)

    • Input: Token stream.
    • Output: Parse tree or Abstract Syntax Tree (AST) representing the grammatical structure.
    • Technique: Context‑free grammar (usually written in BNF/EBNF).
    • Example BNF rule for arithmetic expressions:

      <expr> ::= <term> { (“+” | “-”) <term> }

  4. Semantic Analysis

    • Input: AST.
    • Output: Annotated AST enriched with type information, scope links and other semantic attributes.
    • Checks performed:

      • Type compatibility (e.g. cannot assign a float to an int without conversion).
      • Declaration before use.
      • Scope rules (e.g. variable shadowing).
      • Array‑bounds and function‑argument matching.

  5. Intermediate Code Generation

    • Input: Annotated AST.
    • Output: Platform‑independent Intermediate Representation (IR), often three‑address code.
    • Example three‑address instruction: t1 = a + b

  6. Optimization

    • Input: IR.
    • Output: Optimised IR.
    • Goals: Reduce execution time, memory usage or power consumption.
    • Typical techniques:

      • Constant folding – evaluate constant expressions at compile time.

        Before: t1 = 3 + 4

        After: t1 = 7

      • Dead‑code elimination.
      • Loop‑invariant code motion.
      • Peephole optimisation on generated machine instructions.

  7. Code Generation

    • Input: Optimised IR.
    • Output: Target machine code or assembly language.
    • Key tasks: Register allocation, instruction selection, instruction scheduling.

  8. Linking & Loading

    • Input: Object modules produced by the code generator.
    • Output: Executable program ready to run.
    • Linker role: Resolve external references, combine object files.
    • Loader role: Place the program in memory and start execution.
    • Static vs. dynamic linking:

      • Static linking: Library code is copied into the executable at compile time.
      • Dynamic linking: References to shared libraries are resolved at run‑time, reducing executable size and allowing library updates without recompilation.

Symbol Table – Definition & Example

A symbol table is a data structure maintained by the compiler that records information about each identifier encountered in the source program.

IdentifierTypeScopeMemory / Register
totalintglobal0x1004
iintfor‑loop blockR5
maxfloatfunction calc0x200C

Typical Errors Detected at Each Stage

StageTypical Error TypeExample
Lexical analysisIllegal character / malformed token‘@’ in an identifier
Syntax analysisMissing semicolon, mismatched parentheses‘if (x > 0 { … }’
Semantic analysisType mismatch, undeclared variable, scope violationAssigning float to int without conversion
OptimizationTransformation that changes program meaning (rare – indicates a bug in the optimiser)Removing a side‑effecting statement
LinkingUnresolved external reference, duplicate symbolCalling a function that is not defined in any linked library

Summary of Compilation Stages

StageInputOutputPrimary Purpose
Pre‑processor (optional)Raw source codeModified source codeMacro expansion, file inclusion, conditional compilation
Lexical analysisSource charactersToken streamIdentify lexical units; build symbol table
Syntax analysisToken streamParse tree / ASTCheck grammatical structure using a BNF/EBNF grammar
Semantic analysisASTAnnotated ASTEnforce language semantics (type, scope, declaration rules)
Intermediate code generationAnnotated ASTIR (e.g., three‑address code)Provide a machine‑independent representation
OptimizationIROptimised IRImprove execution time, memory usage, or power consumption
Code generationOptimised IRTarget machine code / assemblyTranslate to executable instructions; allocate registers
Linking & loadingObject modulesExecutable programResolve external references (static/dynamic) and place program in memory

Key Points to Remember

  • Each stage produces an intermediate form that becomes the input for the next stage.
  • Early error detection (lexical → syntax) is easier to locate and fix than later errors (semantic, linking).
  • Optimisation can be applied at source, intermediate, or machine‑code level; the most exam‑focused techniques are constant folding and dead‑code elimination.
  • The front‑end (lexical, syntax, semantic) is language‑specific; the back‑end (IR, optimisation, code generation) is target‑machine specific – a design that promotes reuse.
  • Understanding the role of the symbol table, grammar rules and IR examples is essential for answering typical exam questions.

Typical Exam Question

Question: Explain the role of the symbol table during compilation and indicate at which stages it is accessed.

Answer outline:

  1. Created during lexical analysis when an identifier is first encountered; an entry records its name, type, scope and (later) storage location.
  2. Consulted in syntax analysis to verify that identifiers used in expressions have been declared.
  3. Used extensively in semantic analysis for type checking, scope resolution and ensuring correct use of functions/arrays.
  4. May be referenced again in code generation to allocate registers or memory addresses for each identifier.


Action‑Oriented Review Checklist – How Your Lecture Notes Stack Up Against the 2026 Cambridge AS & A‑Level Computer Science (9618) Syllabus

Use this quick‑audit table to verify that your notes cover every required sub‑topic. Tick the “✔” column as you confirm coverage; if a box is empty, add the suggested material.

Syllabus SectionCovered? (✔)Suggested Action if Gap Exists
1 Information Representation

• Binary, BCD, ASCII/Unicode

• Two’s‑/one’s‑complement, overflow

Add a “Binary arithmetic & overflow” box with a conversion table (binary ↔ decimal ↔ hexadecimal) and a short example of two’s‑complement subtraction.
1.2 Multimedia

• Bitmap vs. vector graphics

• Colour depth, resolution

Insert labelled diagrams of a bitmap image and a vector image; include a quick “size‑estimate” calculation (e.g., 800 × 600 px, 24‑bit colour).
1.3 Compression

• Lossy vs. lossless

• Run‑Length Encoding (RLE)

Provide a one‑page RLE walk‑through (e.g., compressing the string “AAAAABBB”).
2 Communication

• LAN/WAN, client‑server vs. peer‑to‑peer

• Topologies, cloud, wired vs. wireless

• Ethernet/CSMA‑CD, IP addressing, DNS, URLs

Create a “Network cheat‑sheet” with topology icons, IPv4/IPv6 format, subnet mask basics and a step‑by‑step DNS lookup illustration.
3 Hardware

• CPU, RAM/ROM, SRAM/DRAM

• PROM/EPROM/EEPROM, buffers

• Embedded systems, sensors/actuators

Add a comparison table of RAM types (SRAM vs. DRAM) with typical use‑cases; include a short case‑study of an embedded sensor system (e.g., temperature sensor → microcontroller → actuator).
4 Processor Fundamentals

• Von Neumann model, registers, ALU, CU, buses

• Fetch‑Execute cycle, interrupts

• Performance factors (clock speed, CPI, parallelism)

Insert a labelled datapath diagram and a step‑by‑step Fetch‑Execute cycle flowchart; summarise performance factors in a bullet list.
5 System Software

• OS purposes, utility software

• Program libraries, language translators, IDE features

Provide a side‑by‑side comparison of a compiler vs. an interpreter (pros/cons) and list typical IDE features (syntax highlighting, debugger, build tools).
6 Security & Data Integrity

• Security, privacy, integrity

• Threats, firewalls, encryption

• Validation & verification techniques

Add a “Threat‑mitigation matrix” (e.g., phishing → strong passwords + anti‑phishing filter) and a brief note on hash functions for data integrity.
7 Ethics & Ownership

• Professional ethics, copyright

• Software licences, AI impact

Insert an “Ethics decision tree” for a scenario such as re‑using open‑source code in a commercial product.
8 Databases

• Limitations of file‑based storage

• Relational model, ER diagrams, normalisation

• DBMS features, DDL/DML, basic SQL

Provide a mini‑ER‑diagram exercise (e.g., Students‑Courses relationship) and a 3‑NF normalisation worksheet with a sample table.
9 Algorithm Design

• Abstraction, decomposition

• Pseudocode, flowcharts, stepwise refinement

• Basic search (linear, binary) and sort (bubble, selection)

Add a pseudocode template with placeholders for input, processing, and output; include a flowchart for binary search and a table comparing search/sort complexities.
Compilation (covered above)

When you have a tick in every “✔” column, your notes are fully aligned with the 2026 Cambridge AS & A‑Level Computer Science syllabus.