Show understanding of the various stages in the compilation of a program

16.2 Translation Software

Objective

Show understanding of the various stages in the compilation of a program and the key artefacts produced at each stage.

What is Translation Software?

Translation software converts source code written in a high‑level language into a form that a computer can execute. The most common type is a compiler, which performs a series of well‑defined transformations, often preceded by a pre‑processor (e.g. macro expansion in C/C++).

Compilation Pipeline – Overview

The compilation process is a pipeline. Each stage receives an input, applies a specific transformation, and produces an output that becomes the input for the next stage.

Typical flow of compilation stages (source → executable)

Detailed Stages

  1. Pre‑processor (optional)
    • Input: Raw source code.
    • Typical tasks: macro expansion, file inclusion (#include), conditional compilation (#ifdef).
    • Output: Modified source code passed to the lexical analyser.
  2. Lexical Analysis (Scanning)
    • Input: Characters of the (pre‑processed) source code.
    • Output: Stream of tokens (identifiers, keywords, literals, operators, delimiters).
    • Implementation basis: regular expressions & deterministic finite automata (DFA).
    • Key data structure: Symbol table – records each identifier with attributes such as type, scope, and later storage location.
  3. Syntax Analysis (Parsing)
    • Input: Token stream.
    • Output: Parse tree or Abstract Syntax Tree (AST) representing the grammatical structure.
    • Technique: Context‑free grammar (usually written in BNF/EBNF).
    • Example BNF rule for arithmetic expressions:
      <expr> ::= <term> { (“+” | “-”) <term> }
  4. Semantic Analysis
    • Input: AST.
    • Output: Annotated AST enriched with type information, scope links and other semantic attributes.
    • Checks performed:
      • Type compatibility (e.g. cannot assign a float to an int without conversion).
      • Declaration before use.
      • Scope rules (e.g. variable shadowing).
      • Array‑bounds and function‑argument matching.
  5. Intermediate Code Generation
    • Input: Annotated AST.
    • Output: Platform‑independent Intermediate Representation (IR), often three‑address code.
    • Example three‑address instruction: t1 = a + b
  6. Optimization
    • Input: IR.
    • Output: Optimised IR.
    • Goals: Reduce execution time, memory usage or power consumption.
    • Typical techniques:
      • Constant folding – evaluate constant expressions at compile time.
        Before: t1 = 3 + 4
        After: t1 = 7
      • Dead‑code elimination.
      • Loop‑invariant code motion.
      • Peephole optimisation on generated machine instructions.
  7. Code Generation
    • Input: Optimised IR.
    • Output: Target machine code or assembly language.
    • Key tasks: Register allocation, instruction selection, instruction scheduling.
  8. Linking & Loading
    • Input: Object modules produced by the code generator.
    • Output: Executable program ready to run.
    • Linker role: Resolve external references, combine object files.
    • Loader role: Place the program in memory and start execution.
    • Static vs. dynamic linking:
      • Static linking: Library code is copied into the executable at compile time.
      • Dynamic linking: References to shared libraries are resolved at run‑time, reducing executable size and allowing library updates without recompilation.

Symbol Table – Definition & Example

A symbol table is a data structure maintained by the compiler that records information about each identifier encountered in the source program.

IdentifierTypeScopeMemory / Register
totalintglobal0x1004
iintfor‑loop blockR5
maxfloatfunction calc0x200C

Typical Errors Detected at Each Stage

StageTypical Error TypeExample
Lexical analysisIllegal character / malformed token‘@’ in an identifier
Syntax analysisMissing semicolon, mismatched parentheses‘if (x > 0 { … }’
Semantic analysisType mismatch, undeclared variable, scope violationAssigning float to int without conversion
OptimizationTransformation that changes program meaning (rare – indicates a bug in the optimiser)Removing a side‑effecting statement
LinkingUnresolved external reference, duplicate symbolCalling a function that is not defined in any linked library

Summary of Compilation Stages

Stage Input Output Primary Purpose
Pre‑processor (optional) Raw source code Modified source code Macro expansion, file inclusion, conditional compilation
Lexical analysis Source characters Token stream Identify lexical units; build symbol table
Syntax analysis Token stream Parse tree / AST Check grammatical structure using a BNF/EBNF grammar
Semantic analysis AST Annotated AST Enforce language semantics (type, scope, declaration rules)
Intermediate code generation Annotated AST IR (e.g., three‑address code) Provide a machine‑independent representation
Optimization IR Optimised IR Improve execution time, memory usage, or power consumption
Code generation Optimised IR Target machine code / assembly Translate to executable instructions; allocate registers
Linking & loading Object modules Executable program Resolve external references (static/dynamic) and place program in memory

Key Points to Remember

  • Each stage produces an intermediate form that becomes the input for the next stage.
  • Early error detection (lexical → syntax) is easier to locate and fix than later errors (semantic, linking).
  • Optimisation can be applied at source, intermediate, or machine‑code level; the most exam‑focused techniques are constant folding and dead‑code elimination.
  • The front‑end (lexical, syntax, semantic) is language‑specific; the back‑end (IR, optimisation, code generation) is target‑machine specific – a design that promotes reuse.
  • Understanding the role of the symbol table, grammar rules and IR examples is essential for answering typical exam questions.

Typical Exam Question

Question: Explain the role of the symbol table during compilation and indicate at which stages it is accessed.

Answer outline:

  1. Created during lexical analysis when an identifier is first encountered; an entry records its name, type, scope and (later) storage location.
  2. Consulted in syntax analysis to verify that identifiers used in expressions have been declared.
  3. Used extensively in semantic analysis for type checking, scope resolution and ensuring correct use of functions/arrays.
  4. May be referenced again in code generation to allocate registers or memory addresses for each identifier.

Action‑Oriented Review Checklist – How Your Lecture Notes Stack Up Against the 2026 Cambridge AS & A‑Level Computer Science (9618) Syllabus

Use this quick‑audit table to verify that your notes cover every required sub‑topic. Tick the “✔” column as you confirm coverage; if a box is empty, add the suggested material.

Syllabus Section Covered? (✔) Suggested Action if Gap Exists
1 Information Representation
• Binary, BCD, ASCII/Unicode
• Two’s‑/one’s‑complement, overflow
Add a “Binary arithmetic & overflow” box with a conversion table (binary ↔ decimal ↔ hexadecimal) and a short example of two’s‑complement subtraction.
1.2 Multimedia
• Bitmap vs. vector graphics
• Colour depth, resolution
Insert labelled diagrams of a bitmap image and a vector image; include a quick “size‑estimate” calculation (e.g., 800 × 600 px, 24‑bit colour).
1.3 Compression
• Lossy vs. lossless
• Run‑Length Encoding (RLE)
Provide a one‑page RLE walk‑through (e.g., compressing the string “AAAAABBB”).
2 Communication
• LAN/WAN, client‑server vs. peer‑to‑peer
• Topologies, cloud, wired vs. wireless
• Ethernet/CSMA‑CD, IP addressing, DNS, URLs
Create a “Network cheat‑sheet” with topology icons, IPv4/IPv6 format, subnet mask basics and a step‑by‑step DNS lookup illustration.
3 Hardware
• CPU, RAM/ROM, SRAM/DRAM
• PROM/EPROM/EEPROM, buffers
• Embedded systems, sensors/actuators
Add a comparison table of RAM types (SRAM vs. DRAM) with typical use‑cases; include a short case‑study of an embedded sensor system (e.g., temperature sensor → microcontroller → actuator).
4 Processor Fundamentals
• Von Neumann model, registers, ALU, CU, buses
• Fetch‑Execute cycle, interrupts
• Performance factors (clock speed, CPI, parallelism)
Insert a labelled datapath diagram and a step‑by‑step Fetch‑Execute cycle flowchart; summarise performance factors in a bullet list.
5 System Software
• OS purposes, utility software
• Program libraries, language translators, IDE features
Provide a side‑by‑side comparison of a compiler vs. an interpreter (pros/cons) and list typical IDE features (syntax highlighting, debugger, build tools).
6 Security & Data Integrity
• Security, privacy, integrity
• Threats, firewalls, encryption
• Validation & verification techniques
Add a “Threat‑mitigation matrix” (e.g., phishing → strong passwords + anti‑phishing filter) and a brief note on hash functions for data integrity.
7 Ethics & Ownership
• Professional ethics, copyright
• Software licences, AI impact
Insert an “Ethics decision tree” for a scenario such as re‑using open‑source code in a commercial product.
8 Databases
• Limitations of file‑based storage
• Relational model, ER diagrams, normalisation
• DBMS features, DDL/DML, basic SQL
Provide a mini‑ER‑diagram exercise (e.g., Students‑Courses relationship) and a 3‑NF normalisation worksheet with a sample table.
9 Algorithm Design
• Abstraction, decomposition
• Pseudocode, flowcharts, stepwise refinement
• Basic search (linear, binary) and sort (bubble, selection)
Add a pseudocode template with placeholders for input, processing, and output; include a flowchart for binary search and a table comparing search/sort complexities.
Compilation (covered above)

When you have a tick in every “✔” column, your notes are fully aligned with the 2026 Cambridge AS & A‑Level Computer Science syllabus.

Create an account or Login to take a Quiz

91 views
0 improvement suggestions

Log in to suggest improvements to this note.