Cambridge Notes, Past Papers, Revision Questions

5.2 Language Translators – Assembler Software

Why an Assembler Is Needed

Assembly language is a symbolic, human‑readable form of a computer’s machine instructions.

The CPU can only execute machine code (binary op‑codes), so an assembler must translate the mnemonics into the binary format required by the target architecture.

Human readability: Mnemonics such as MOV, ADD and symbolic labels replace long strings of 0s and 1s.

Maintainability: Labels, comments and macro facilities make programmes easier to modify and debug.

Portability of source: The same assembly source can be assembled for different CPUs by using the appropriate assembler.

Error detection: Assemblers spot syntax errors, undefined labels and illegal operand combinations before the programme is run.

CPU Fundamentals (Sidebar)

Key Registers (typical for a simple 8‑bit CPU)

PC (Program Counter): Holds the address of the next instruction to fetch.

IR (Instruction Register): Holds the current instruction while it is being decoded.

ACC (Accumulator): Primary arithmetic/logic register.

IX (Index Register): Used for indexed addressing.

MAR (Memory Address Register) & MDR (Memory Data Register): Interface with main memory.

Flags / Status Register: Zero, Carry, Negative, Overflow, etc.

Fetch‑Execute Cycle (simplified)

FETCH: MAR ← PC ; address of next instruction

MDR ← Memory[MAR]

IR ← MDR

PC ← PC + 1

DECODE: Decode IR to determine opcode and operands

EXECUTE: Perform the operation (ALU, memory access, branch, etc.)

Interrupts (brief note)

When an interrupt occurs the CPU saves the current PC and status, jumps to a fixed interrupt‑service‑routine (ISR) address, and executes the ISR code. Assemblers must be able to generate ISR entry/exit code and allow the programmer to define interrupt vectors.

Two‑Pass Assembler Process

Most A‑Level assemblers operate in two passes. The flow‑chart (see figure at the end) summarises the stages.

Pass 1 – Symbol‑table construction
- The source file is scanned line‑by‑line.
- Every label is entered into a symbol table together with its *tentative* address (the location counter at that point).
- Only the length of each instruction is calculated; no op‑codes are generated yet.

Pass 2 – Opcode generation & address resolution
- The source is read a second time.
- Each mnemonic is looked up in the instruction‑set table and the corresponding binary opcode is emitted.
- Operands are encoded using the addresses stored in the symbol table (or immediate values).
- Relocation records are created for any address that may change when the programme is linked with other modules.

Concrete Symbol‑Table Example

Source (first pass)

0010: START: LDM #5

0012: ADD ONE

0014: STA RESULT

0016: HLT

0018: ONE: DB 0x01

0019: RESULT: DB 0x00

After Pass 1 the assembler has built the following table:

Label	Address (hex)
START	10
ONE	18
RESULT	19

During Pass 2 the forward‑referenced labels ONE and RESULT are replaced by the addresses 18h and 19h respectively.

One‑Pass vs. Two‑Pass Assemblers (and Macro Facilities)

Feature	One‑Pass Assembler	Two‑Pass Assembler
Symbol handling	Labels must be defined before they are used (no forward references).	Labels may be forward‑referenced; the first pass records them.
Complexity & speed	Simpler and faster, but limited in capability.	More complex, requires two scans, but supports full language features.
Macro support	Rarely included.	Common; directives such as `MACRO … ENDM` expand reusable code blocks before Pass 2.

Macro Directives (optional for A‑Level)

Macros allow a programmer to define a short name for a frequently used sequence of instructions.

MACRO INCX

LDA X

ADD #1

STA X

ENDM

; later in the program

INCX ; expands to the three instructions above

During the macro‑expansion phase the assembler replaces each macro call with its body, then proceeds with the normal two‑pass assembly.

Addressing Modes (Cambridge Syllabus Requirement)

The syllabus expects recognition of the four basic modes. The table uses a hypothetical 8‑bit processor.

Mode	Description	Example (mnemonic operand)
Immediate	Operand is a constant encoded directly in the instruction.	`LDA #5`
Direct	Operand is a memory address that contains the data.	`LDA VALUE`
Indirect	Operand is a memory address that holds the address of the data.	`LDA @PTR`
Indexed	Effective address = base address + contents of an index register (e.g., X).	`LDA ARRAY,X`

Typical Instruction‑Set Example (Cambridge Style)

Opcode (hex)	Mnemonic	Operand type	Operation
01	LDM	Immediate	Load accumulator with a constant.
02	LDD	Direct	Load accumulator from a memory address.
03	ADD	Direct/Immediate	Add operand to accumulator.
04	SUB	Direct/Immediate	Subtract operand from accumulator.
05	STA	Direct	Store accumulator to a memory address.
06	JMP	Direct	Unconditional jump to address.
07	CMP	Direct/Immediate	Compare operand with accumulator (sets status flags).
FF	HLT	None	Halt the processor.

Example Translation (Two‑Pass Assembly)


START:  LDM   #5        ; Load accumulator with decimal 5
ADD   ONE       ; Add constant ONE
STA   RESULT    ; Store the sum
HLT               ; Stop execution
ONE:    DB    0x01      ; constant ONE = 1 (hex)
RESULT: DB    0x00      ; reserve a byte for the result

After the two passes the object file (shown as hexadecimal bytes) might look like:


00: 01 05   ; LDM #5          (opcode 01, immediate 05h)
02: 03 18   ; ADD ONE         (opcode 03, address of ONE = 18h)
04: 05 19   ; STA RESULT      (opcode 05, address of RESULT = 19h)
06: FF      ; HLT
18: 01      ; ONE = 0x01
19: 00      ; RESULT (initialised to 0)

The assembler resolved the symbolic labels ONE and RESULT to actual memory addresses (18h and 19h) during Pass 2.

Object File, Relocation & Linking

Object file: Binary file produced by the assembler containing machine code and a table of relocation records.

Relocation records: Indicate which addresses may need adjustment when the program is linked with other modules.

Linking: One or more object files are combined to form a single executable. The linker uses the relocation records to fix up addresses and to merge symbol tables.

Error Detection & Typical Diagnostics

Assemblers report errors during both passes. Two common examples are:

Undefined label: Error: label LOOP not defined. – occurs when a label is referenced but never declared.

Operand‑size mismatch: Error: immediate value out of range for opcode LDM. – occurs when a constant does not fit the operand field.

Most assemblers display the line number and a short description, allowing the programmer to correct the source before execution.

Language Translators – Assemblers, Compilers & Interpreters (Comparison)

Translator	Typical Input Language	Output	Key Advantages	Key Disadvantages
Assembler	Assembly language (mnemonics + directives)	Object code (machine code + relocation info)	Very close to hardware → fine‑grained control; fast execution; easy to generate interrupt‑service routines.	Programmer must manage registers, memory layout and low‑level details.
Compiler	High‑level language (e.g., C, Java)	Object code (or byte‑code for languages like Java)	Abstraction → faster development, portability, automatic optimisation.	Less direct control of hardware; compilation can be time‑consuming.
Interpreter	Scripting language (e.g., Python) or byte‑code (Java VM)	Direct execution by an interpreter or virtual machine	Immediate feedback, platform independence, easy debugging.	Generally slower execution; runtime errors may appear later.

Understanding the role of each translator helps students appreciate why assembly language is still taught at A‑Level despite the prevalence of high‑level languages.

Assessment Objectives (AO) Mapping

AO	What is assessed in this topic
AO1	Knowledge of the purpose of an assembler, the two‑pass process, macro directives, CPU registers, fetch‑execute cycle, and basic addressing modes.
AO2	Analysis of how assembly language is converted into machine code: symbol‑table construction, forward‑reference resolution, opcode generation, relocation, and comparison with compilers/interpreters.

Flowchart of the two‑pass assembler

Pass 1 – Scan source → build Symbol Table → compute instruction lengths

Pass 2 – Translate mnemonics → generate op‑codes → resolve addresses → write Object file (+ relocation records)

Show understanding of the need for: assembler software for the translation of an assembly language program

5.2 Language Translators – Assembler Software