Cambridge Notes, Past Papers, Revision Questions

5.2 Language Translators – Why a Compiler Is Needed

1. Purpose of Translating a High‑Level Program

Human‑readability vs. machine‑readability – High‑level languages (e.g., Java, C++, Python) are written for people; a computer can only execute machine code, the binary representation of the processor’s instruction set.

Portability – The same source program can be compiled for different hardware architectures (x86, ARM, MIPS, …). Only the compiler needs to be changed.

Performance – Compiled code runs directly on the CPU, giving near‑hardware speed.

Static analysis & safety – Type checking, scope rules, and other semantic checks are performed before the program ever runs, catching many errors early.

Distribution & IP protection – End users receive an executable (binary) rather than source code, simplifying deployment and protecting intellectual property.

2. Types of Language Translators (Cambridge terminology)

Translator	What it does	Typical output
Assembler	Converts symbolic assembly language (mnemonics) into machine code.	Object file containing binary instructions.
Compiler	Translates an entire high‑level program into a lower‑level language (usually assembly or machine code) before execution.	Executable binary (or a set of object files that are later linked).
Interpreter	Translates and executes statements one at a time at run‑time.	No permanent binary; the interpreter program performs the translation each time.
Mixed‑mode (e.g., Java, C#)	Source is first compiled to an intermediate “byte‑code”. The byte‑code is then interpreted or JIT‑compiled to native machine code at run‑time.	Byte‑code files (.class, .dll) plus a virtual machine/JIT compiler.

3. Compilation vs Interpretation – A Balanced View

Aspect	Compilation	Interpretation
When translation occurs	Before execution (offline)	During execution (online)
Execution speed	Generally faster – native code runs directly on the CPU.	Slower – each statement must be parsed and executed each time.
Error detection	Many errors caught at compile‑time (syntax, type, undeclared identifiers).	Errors may appear only at run‑time (e.g., division by zero).
Portability of output	Binaries are platform‑specific; source remains portable.	Source is portable if an interpreter exists for the target platform.
Build time	Longer initial build (compilation, linking).	Shorter start‑up; translation happens continuously.
Typical use‑cases	System software, performance‑critical applications, embedded devices.	Scripting, rapid prototyping, teaching environments.

4. Typical Compiler Phases (as required by the syllabus)

Lexical analysis (Scanning) – converts the character stream into tokens (identifiers, literals, operators).

Syntactic analysis (Parsing) – builds a parse tree according to the language grammar.

Semantic analysis – checks type consistency, scope rules, and other language semantics.

Intermediate code generation – produces a platform‑independent representation (e.g., three‑address code).

Optimization – improves the intermediate code (dead‑code elimination, loop unrolling, constant folding, etc.).

Target‑code generation – translates the optimized intermediate code into assembly or machine code.

Assembly and linking – assembles object files and resolves external references to produce the final executable binary.

Suggested diagram: <img src="compiler-phases.png" alt="Compiler phases flow‑chart">

4.1 Example: Machine‑Code Representation

For a 32‑bit RISC processor, the assembly instruction ADD R1, R2, R3 might be encoded as:

0100 0010 0011 0001 0000 0000 0000 0000
│   │   │   │   │               │
│   │   │   │   └─ Unused bits (padding)
│   │   │   └─ Destination register R1 (0010)
│   │   └─ Source register R2 (0011)
│   └─ Source register R3 (0001)
└─ Opcode for ADD (0100)

This 32‑bit word is what the CPU fetches and executes.

5. Optimisation – What a Compiler Can Do

5.1 Dead‑Code Elimination


// Before optimisation
int a = 5;
int b = 10;          // b is never used
int c = a + 2;
print(c);

After optimisation the assignment to b is removed.

5.2 Loop Unrolling


// Before optimisation (n = 4)
for (int i = 0; i < 4; i++) {
sum += a[i];
}

After unrolling the loop (four iterations are expanded):


sum += a[0];
sum += a[1];
sum += a[2];
sum += a[3];

Unrolling reduces loop‑control overhead at the cost of a larger code size.

6. Error Handling – Compile‑time vs. Run‑time

Compile‑time errors – detected during lexical, syntactic or semantic analysis.
- Syntax error: missing ';' before '}'
- Type error: cannot assign String to int
- Undeclared identifier: 'x' was not declared in this scope

Run‑time errors – become apparent only when the program is executing.
- Division by zero
- Array index out of bounds
- Null‑pointer dereference

Compilers reduce the frequency of run‑time failures, but they cannot eliminate errors that depend on dynamic data.

7. Target‑Specific Concerns (Hardware–Software Interface)

Instruction‑set architecture (ISA) – the set of binary opcodes a processor understands (e.g., x86, ARM, MIPS).

Endianness – order in which multi‑byte data is stored (big‑endian vs. little‑endian); the compiler must generate code that respects the target’s convention.

Register allocation – the optimiser decides which variables reside in which CPU registers to minimise memory access.

Calling conventions – rules for passing arguments and returning values; the code generator must follow the platform’s convention.

8. Integrated Development Environments (IDEs)

Although an IDE is not a translator, it forms part of the “language‑toolchain” described in the syllabus. Three core IDE features are:

Real‑time syntax checking – a lightweight lexical and syntactic analyser highlights errors as you type.

Code completion (auto‑suggest) – the IDE proposes identifiers, method signatures and library functions, speeding up coding.

Debugging tools – breakpoints, step‑through execution, watch variables and call‑stack inspection.

Examples: Eclipse (Java), Visual Studio Code (multiple languages), PyCharm (Python).

9. Mixed‑Mode Translation – Java & C# Example

Source → Byte‑code – javac translates .java files into platform‑independent .class files.

Byte‑code → Machine code – The Java Virtual Machine (JVM) either interprets the byte‑code or uses a Just‑In‑Time (JIT) compiler to translate frequently executed sections into native code at run‑time.

This hybrid approach gives the portability of interpretation together with the speed of compilation for “hot spots”.

10. Alignment with Cambridge Assessment Objectives

AO1 – Knowledge and Understanding: definitions of compiler, assembler, interpreter, machine code, optimisation, endianness, etc.

AO2 – Application and Analysis: analyse trade‑offs between compilation and interpretation; evaluate optimisation techniques; interpret compiler error messages.

AO3 – Design and Evaluation: design a simple language translator (e.g., a mini‑compiler) and evaluate the benefits of using an IDE or mixed‑mode translation for a given problem.

11. Checklist – Does the Note Meet the Cambridge Syllabus?

Syllabus Item	Covered?	Notes / Action
Purpose of a compiler (human‑ vs. machine‑readability, portability, performance, safety, distribution)	✔
Types of translators (assembler, compiler, interpreter, mixed‑mode)	✔
Compilation vs. interpretation comparison	✔
Standard compiler phases (lexical → linking)	✔
Optimisation techniques (dead‑code elimination, loop unrolling)	✔
Compile‑time and run‑time error handling	✔
Target‑specific concerns (ISA, endianness, register allocation, calling conventions)	✔
IDE support (syntax checking, code completion, debugging)	✔
Mixed‑mode translation (Java/C# byte‑code + JIT)	✔
Assessment objectives (AO1‑AO3) linkage	✔

12. Summary

A compiler is the essential bridge that converts human‑readable high‑level programs into the binary instructions a computer can execute. It provides speed, static error checking, and the ability to generate platform‑specific binaries while supporting source‑level portability. Understanding the full translation chain—including assemblers, interpreters, IDE support, and mixed‑mode approaches—enables students to meet the full range of requirements in the Cambridge AS/A‑Level Computer Science (9618) syllabus.

Show understanding of the need for: a compiler for the translation of a high-level language program

5.2 Language Translators – Why a Compiler Is Needed