16.2 Translation Software – Interpreters and Compilers
Learning objective
Explain how an interpreter can execute a program without producing a separate translated version, and compare this with the compilation process used by a compiler.
1. Interpreters – Executing Directly from Source
1.1 What an interpreter does
Reads the source program, translates each statement (or a small group of statements) on the fly , and executes it immediately.
The original source text is never turned into an object file or a standalone executable.
Typical environments: scripting languages, REPLs (Read‑Eval‑Print Loops), educational tools, and many web‑based languages.
1.2 Execution model (statement‑by‑statement)
Read next line (or token stream) from the source file.
Lexical analysis – characters → tokens (identifiers, literals, operators …).
Syntax analysis (parsing) – build a tiny abstract syntax tree (AST) for that statement.
Semantic checks – type rules, scope rules, constant folding, etc.
Immediate execution – evaluate the AST node and perform the required action (assignment, I/O, function call …).
Repeat until the end of the file or a run‑time termination condition is met.
1.3 Pseudocode illustration
while not EOF:
line = read_line() # raw source
tokens = lexer(line) # lexical analysis
ast = parser(tokens) # syntax analysis
execute(ast) # direct execution
1.4 Key characteristics
Instant feedback – ideal for rapid prototyping and debugging.
Source line numbers are preserved, so run‑time error messages refer to the original code.
Supports dynamic features such as eval(), run‑time code generation, and interactive input.
Highly portable: the same source runs on any machine that provides the interpreter.
1.5 Typical examples (relevant to A‑Level)
Python, Ruby, PHP, JavaScript (in a browser)
Interactive consoles – e.g. python REPL, sqlite3 shell
Educational tools that allow a single statement to be tested without a compile‑link cycle.
1.6 Diagram – interpreter flow
Typical interpreter processing pipeline.
2. Compilers – Translating the Whole Program Once
2.1 Compilation stages (Cambridge syllabus)
Lexical analysis – source characters → tokens.
Syntactic analysis (parsing) – tokens → parse tree / abstract syntax tree.
Semantic analysis – type checking, scope resolution, constant folding.
Intermediate code generation – usually three‑address code or byte‑code.
Optimisation – dead‑code elimination, loop optimisation, register allocation, etc.
Target code generation – machine code or assembly for the chosen architecture.
Linking (optional) – combine object modules into a single executable.
2.2 One‑time work vs. repeated work
Phase
Interpreter
Compiler
Lexical analysis
Repeated for every statement at run‑time
Performed once for the whole program
Parsing
Repeated for each statement
Done once for the complete source file
Code generation
Direct execution – no separate code file
Produces object code / executable file
Optimisation
Usually none (or very limited JIT)
Extensive optimisation possible
2.3 Performance model
For an interpreter the total run‑time is the sum of the costs of analysing and executing each statement:
\(T{\text{total}} = \displaystyle\sum {i=1}^{n}\bigl(T{\text{lex}}(i) + T {\text{parse}}(i) + T_{\text{exec}}(i)\bigr)\)
For a compiled program the lexical, parsing and code‑generation costs are incurred once, after which the execution time \(T_{\text{exec}}\) is usually much smaller.
2.4 Diagram – compiler pipeline
Standard compilation flow.
3. Describing Language Syntax – BNF and Railroad Diagrams
3.1 Backus‑Naur Form (BNF)
Formal notation for a context‑free grammar .
Each rule: <non‑terminal> ::= <definition>.
Terminals are written in quotes, non‑terminals in angle brackets.
3.2 Simple BNF example – arithmetic expressions
<expr> ::= <term> { ("+" | "-") <term> }
<term> ::= <factor> { ("*" | "/") <factor> }
<factor> ::= <number> | "(" <expr> ")"
<number> ::= <digit> { <digit> }
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
3.3 Railroad (syntax‑diagram) representation
Railroad diagram for the rule <term>. Start → <factor> → loop ( “*” or “/” → <factor> ) → End.
3.4 Why the grammar matters
Both interpreters and compilers use the same grammar to recognise legal programs.
BNF is the basis for parser generators such as lex/yacc, ANTLR, or modern IDE syntax checkers.
Understanding the grammar helps students predict where syntax errors will be reported during interpretation or compilation.
4. Evaluating Expressions – Reverse Polish Notation (RPN)
4.1 What is RPN?
Also called post‑fix notation ; operators appear after their operands.
No parentheses are required – the order of evaluation is encoded by the position of the operators.
Many virtual machines (e.g., the Java Virtual Machine) use a stack‑based model that mirrors RPN.
4.2 Infix → RPN conversion (Shunting‑Yard algorithm – syllabus outline)
Read the infix token stream left‑to‑right.
Output operands (numbers, identifiers) immediately.
Push operators onto a stack, respecting precedence and associativity.
If a lower‑precedence operator is encountered, pop higher‑precedence operators to the output.
When a right parenthesis is read, pop operators until the matching left parenthesis is removed.
After the input is exhausted, pop any remaining operators to the output.
4.3 Worked example
Infix expression:
3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3
Step‑by‑step conversion yields the RPN string:
3 4 2 * 1 5 - 2 3 ^ ^ / +
4.4 RPN evaluation using a stack
stack = []
for token in RPN:
if token is operand:
stack.push(token)
else: # token is an operator
b = stack.pop()
a = stack.pop()
result = apply(token, a, b)
stack.push(result)
final_result = stack.pop()
Applying the algorithm to the example above produces the numeric result 3.00012207… (rounded).
4.5 Diagram – stack evaluation of RPN
Stack‑based evaluation of the RPN expression.
4.6 Syllabus relevance
Shows how an interpreter can evaluate expressions without building a full program‑wide AST.
Provides the conceptual link to stack‑based byte‑code execution in virtual machines.
5. Comparative Summary – Interpreter vs. Compiler
Aspect
Interpreter
Compiler
Translation strategy
Analyse & execute each statement at run‑time
Analyse the whole program once, then generate executable code
Output produced
No separate executable; runs inside the interpreter
Object code / executable (may need linking)
Execution speed
Generally slower – lexical and parsing work repeated for each statement
Usually faster – code already in machine language, heavy optimisation possible
Development cycle
Immediate feedback; ideal for debugging and exploratory coding
Longer compile‑link cycle; allows optimisation and static error checking
Portability
Source runs wherever the interpreter is available
Executable must be rebuilt for each target CPU/OS
Typical languages
Python, Ruby, PHP, JavaScript (browser), Bash
C, C++, Pascal, Java (compiled to byte‑code), Fortran
Memory usage
Interpreter itself occupies memory; source remains in memory
Generated code occupies memory; source can be discarded after compilation
6. Suggested Classroom Diagrams (for teacher use)
Interpreter flowchart – source line → lexical analysis → parsing → semantic checks → immediate execution → next line.
Compiler pipeline – whole source → lexical → parsing → semantic → optimisation → code generation → object file → linking → executable.
BNF / Railroad diagram for a simple arithmetic grammar (as shown above).
RPN evaluation stack diagram – illustrate tokens being pushed and popped during evaluation.
7. Key take‑aways
An interpreter translates and executes each statement on the fly; no separate translated version is ever produced.
A compiler performs the same translation steps once for the whole program, producing object code that can be run repeatedly without re‑analysis.
Both rely on the same underlying grammar (BNF or railroad diagrams) to recognise valid statements.
Reverse Polish Notation demonstrates a simple, stack‑based way of evaluating expressions – a technique used by many interpreters and virtual machines.
Trade‑offs:
Interpreters – rapid feedback and high portability, but slower execution.
Compilers – faster run‑time performance and optimisation, but require a separate compile step and are less portable without recompilation.