Cambridge Notes, Past Papers, Revision Questions

4.2 Assembly Language

Objective

Describe the different stages of the assembly process for a two‑pass assembler and relate assembly language to the machine code that the CPU executes.

1. Why an Assembler Is Needed

The Von Neumann architecture stores both program instructions and data in the same memory. The CPU repeatedly performs the fetch‑decode‑execute cycle, fetching a binary instruction word from memory, decoding its opcode, and executing the indicated operation.

Programmers cannot write these binary words directly; they are long strings of 0s and 1s that are difficult to read, write, and maintain.

An assembler translates a human‑readable assembly language (mnemonics and symbolic operands) into the exact binary instruction words that the CPU expects during the fetch‑execute cycle.

Thus the assembler is the bridge between the programmer’s notation and the hardware’s binary representation.

2. Relationship Between Assembly Language and Machine Code

Each assembly instruction is mapped to a fixed‑length instruction word consisting of:

Field	Typical Size (bits)	Content
Opcode	8	Identifies the operation (e.g. LDA, ADD, JMP)
Address / Register	16 (or two 8‑bit bytes)	Operand address, immediate constant, or register identifier, depending on the addressing mode
Unused / Padding	0–8	Ensures all instruction words have the same length (commonly 24 bits = 3 bytes)

The assembler replaces each mnemonic with its 8‑bit opcode and each symbolic operand with the numeric value (address, constant or register code) required by the operand field.

3. Two‑Pass Assembler – Overview

A two‑pass assembler separates the tasks of collecting symbols and generating code. This avoids the need for complex back‑patching when forward references are used.

First Pass – builds the symbol table and determines the final address of every label.

Second Pass – uses the completed symbol table to translate mnemonics into binary opcodes and to resolve all operand addresses.

4. First Pass – Symbol Table Construction & Location Counter Management

A Location Counter (LC) holds the address that will be assigned to the next byte/word that the assembler generates.

Processing steps for each line:

Read the line.

If the line contains a label, store label → LC in the symbol table. Duplicate labels are flagged as errors.

Identify any assembler directive that influences the LC and update the LC accordingly (examples below).

If the line is an instruction, increase the LC by the size of its instruction word (normally 3 bytes).

If the assembler supports macros, expand them now so that the expanded lines are also scanned.

Effect of Common Directives on the LC

Directive	Purpose	LC Effect (example)
`.ORG 0x1000`	Set the starting address of the program.	LC ← 0x1000
`.ALIGN 4`	Round LC up to the next multiple of 4.	LC 0x1003 → 0x1004
`.BYTE 0xFF`	Reserve one byte of storage.	LC ← LC + 1
`.WORD 0x1234`	Reserve two bytes (a word).	LC ← LC + 2
`.RESW 5`	Reserve 5 words (10 bytes).	LC ← LC + 10

5. Second Pass – Code Generation

Re‑read the source file, now with a complete symbol table.

For each instruction:
- Lookup the opcode for the mnemonic.
- Replace every symbolic operand with the numeric address/constant obtained from the symbol table.
- Encode the instruction according to the instruction‑word format (opcode + operand field).

Process directives that generate data (.BYTE, .WORD, etc.) and write the corresponding bytes to the object file.

Produce a listing file** (source line ↔ generated machine word) and report any undefined symbols.

6. Summary of Passes

Pass	Primary Activities	Outputs Produced
First Pass	Read source line‑by‑line Maintain Location Counter Build Symbol Table (label → address) Handle directives that affect LC Expand macros (if supported)	Symbol table, final LC value, duplicate‑label diagnostics
Second Pass	Translate mnemonics to op‑codes Resolve operand addresses using Symbol Table Generate instruction words and data bytes Produce object code, listing file, undefined‑symbol diagnostics	Object code, listing file, error report

Pass

Primary Activities

Outputs Produced

First Pass

Read source line‑by‑line

Maintain Location Counter

Build Symbol Table (label → address)

Handle directives that affect LC

Expand macros (if supported)

Symbol table, final LC value, duplicate‑label diagnostics

Second Pass

Translate mnemonics to op‑codes

Resolve operand addresses using Symbol Table

Generate instruction words and data bytes

Produce object code, listing file, undefined‑symbol diagnostics

Object code, listing file, error report

7. Worked Example – Tracing a Simple Program

Source program (hypothetical 24‑bit instruction format, 3 bytes per instruction):

.ORG 0x0000

START: LDA VALUE

STA RESULT

JMP END

VALUE: .WORD 5

RESULT: .WORD 0

END: HLT

First Pass – Symbol Table & LC

Line (source)	LC before line	Action / Symbol Table update
.ORG 0x0000	—	LC ← 0x0000
START: LDA VALUE	0x0000	add `START → 0x0000`; LC ← 0x0003
STA RESULT	0x0003	LC ← 0x0006
JMP END	0x0006	LC ← 0x0009
VALUE: .WORD 5	0x0009	add `VALUE → 0x0009`; LC ← 0x000B
RESULT: .WORD 0	0x000B	add `RESULT → 0x000B`; LC ← 0x000D
END: HLT	0x000D	add `END → 0x000D`; LC ← 0x0010

Second Pass – Machine Code Generation

Address	Source	Machine Word (hex)
0x0000	LDA VALUE	01 00 09
0x0003	STA RESULT	02 00 0B
0x0006	JMP END	03 00 0D
0x0009	.WORD 5	00 00 05
0x000B	.WORD 0	00 00 00
0x000D	HLT	FF 00 00

Notice how the addresses (09, 0B, 0D) were obtained from the symbol table built during the first pass.

8. Instruction‑Set Overview – Word Format

For the simplified processor used in the Cambridge syllabus the instruction word is 24 bits (3 bytes):

Bits	Field	Explanation
23‑16	Opcode (8 bits)	Identifies the operation (e.g. 01 = LDA)
15‑0	Operand (16 bits)	Address, immediate constant, or register code, depending on addressing mode

Example: 01 00 09 → opcode = 01 (LDA), operand = 0x0009 (address of VALUE).

9. Instruction Groups (Syllabus Requirement)

Group	Typical Mnemonics	Purpose
Data‑movement	LDA, STA, LD, ST	Transfer data between registers and memory
Arithmetic	ADD, SUB, MUL, DIV	Perform integer arithmetic
Logical / Bit‑manipulation	AND, OR, XOR, NOT, SHL, SHR	Boolean operations and shifts
Control‑flow	JMP, JZ, JNZ, CALL, RET	Alter the sequential execution order
Compare / Test	CMP, TEST, TST	Set condition codes for subsequent branches

10. Addressing Modes

The assembler uses the symbol table to resolve the operand required by each mode.

Mode	Syntax (example)	What the CPU sees
Immediate	`LDA #5`	Operand field contains the constant value 5
Direct	`LDA VALUE`	Operand field contains the absolute address of `VALUE`
Indirect	`LDA @PTR`	CPU fetches the address stored at `PTR`, then loads the value at that address
Indexed	`LDA ARRAY,X`	Effective address = address(ARRAY) + contents of register X
Relative (branch)	`JMP LABEL`	Operand is a signed offset from the current PC; assembler computes offset using the symbol table

11. Macro Processing (Optional)

Macros are not required for the Cambridge exam but many assemblers support them. If used, macro expansion occurs during the first pass, before the symbol table is finalised.

MACRO INCR REG

ADD #1, REG

ENDM

START: INCR A ; expands to: ADD #1, A

HLT

During expansion the macro body is inserted into the source stream, and any labels defined inside the macro are made unique (often by appending a numeric suffix) to avoid collisions.

12. Error Handling

Duplicate label definitions – detected in the first pass; assembler reports the line numbers.

Undefined symbols – reported after the second pass when an operand cannot be found in the symbol table.

Illegal directive usage (e.g., .ORG to a non‑word boundary, mis‑aligned .ALIGN) – flagged in the pass where they appear.

Macro‑related errors – undefined macro name, recursive macro expansion, or label clashes inside a macro.

13. Further Reading – Bit Manipulation & Binary Shifts

Later in the syllabus (Topic 5) you will study how to implement bit‑wise operations and shifts directly in assembly. The op‑codes shown in the instruction‑set table (e.g., SHL, SHR, AND, OR) are the building blocks for tasks such as masking, setting, clearing, and rotating bits.

Suggested Diagram

Two‑pass assembler flowchart — **Flowchart of the two‑pass assembly process**

Describe the different stages of the assembly process for a two-pass assembler

4.2 Assembly Language