Describe the different stages of the assembly process for a two-pass assembler

4.2 Assembly Language

Objective

Describe the different stages of the assembly process for a two‑pass assembler and relate assembly language to the machine code that the CPU executes.

1. Why an Assembler Is Needed

  • The Von Neumann architecture stores both program instructions and data in the same memory. The CPU repeatedly performs the fetch‑decode‑execute cycle, fetching a binary instruction word from memory, decoding its opcode, and executing the indicated operation.
  • Programmers cannot write these binary words directly; they are long strings of 0s and 1s that are difficult to read, write, and maintain.
  • An assembler translates a human‑readable assembly language (mnemonics and symbolic operands) into the exact binary instruction words that the CPU expects during the fetch‑execute cycle.
  • Thus the assembler is the bridge between the programmer’s notation and the hardware’s binary representation.

2. Relationship Between Assembly Language and Machine Code

Each assembly instruction is mapped to a fixed‑length instruction word consisting of:

FieldTypical Size (bits)Content
Opcode8Identifies the operation (e.g. LDA, ADD, JMP)
Address / Register16 (or two 8‑bit bytes)Operand address, immediate constant, or register identifier, depending on the addressing mode
Unused / Padding0–8Ensures all instruction words have the same length (commonly 24 bits = 3 bytes)

The assembler replaces each mnemonic with its 8‑bit opcode and each symbolic operand with the numeric value (address, constant or register code) required by the operand field.

3. Two‑Pass Assembler – Overview

A two‑pass assembler separates the tasks of collecting symbols and generating code. This avoids the need for complex back‑patching when forward references are used.

  1. First Pass – builds the symbol table and determines the final address of every label.
  2. Second Pass – uses the completed symbol table to translate mnemonics into binary opcodes and to resolve all operand addresses.

4. First Pass – Symbol Table Construction & Location Counter Management

  • A Location Counter (LC) holds the address that will be assigned to the next byte/word that the assembler generates.
  • Processing steps for each line:

  1. Read the line.
  2. If the line contains a label, store label → LC in the symbol table. Duplicate labels are flagged as errors.
  3. Identify any assembler directive that influences the LC and update the LC accordingly (examples below).
  4. If the line is an instruction, increase the LC by the size of its instruction word (normally 3 bytes).
  5. If the assembler supports macros, expand them now so that the expanded lines are also scanned.

Effect of Common Directives on the LC

DirectivePurposeLC Effect (example)
.ORG 0x1000Set the starting address of the program.LC ← 0x1000
.ALIGN 4Round LC up to the next multiple of 4.LC 0x1003 → 0x1004
.BYTE 0xFFReserve one byte of storage.LC ← LC + 1
.WORD 0x1234Reserve two bytes (a word).LC ← LC + 2
.RESW 5Reserve 5 words (10 bytes).LC ← LC + 10

5. Second Pass – Code Generation

  1. Re‑read the source file, now with a complete symbol table.
  2. For each instruction:

    • Lookup the opcode for the mnemonic.
    • Replace every symbolic operand with the numeric address/constant obtained from the symbol table.
    • Encode the instruction according to the instruction‑word format (opcode + operand field).

  3. Process directives that generate data (.BYTE, .WORD, etc.) and write the corresponding bytes to the object file.
  4. Produce a listing file** (source line ↔ generated machine word) and report any undefined symbols.

6. Summary of Passes

PassPrimary ActivitiesOutputs Produced
First Pass

  • Read source line‑by‑line
  • Maintain Location Counter
  • Build Symbol Table (label → address)
  • Handle directives that affect LC
  • Expand macros (if supported)

Symbol table, final LC value, duplicate‑label diagnostics
Second Pass

  • Translate mnemonics to op‑codes
  • Resolve operand addresses using Symbol Table
  • Generate instruction words and data bytes
  • Produce object code, listing file, undefined‑symbol diagnostics

Object code, listing file, error report

7. Worked Example – Tracing a Simple Program

Source program (hypothetical 24‑bit instruction format, 3 bytes per instruction):

.ORG 0x0000

START: LDA VALUE

STA RESULT

JMP END

VALUE: .WORD 5

RESULT: .WORD 0

END: HLT

First Pass – Symbol Table & LC

Line (source)LC before lineAction / Symbol Table update
.ORG 0x0000LC ← 0x0000
START: LDA VALUE0x0000add START → 0x0000; LC ← 0x0003
STA RESULT0x0003LC ← 0x0006
JMP END0x0006LC ← 0x0009
VALUE: .WORD 50x0009add VALUE → 0x0009; LC ← 0x000B
RESULT: .WORD 00x000Badd RESULT → 0x000B; LC ← 0x000D
END: HLT0x000Dadd END → 0x000D; LC ← 0x0010

Second Pass – Machine Code Generation

AddressSourceMachine Word (hex)
0x0000LDA VALUE01 00 09
0x0003STA RESULT02 00 0B
0x0006JMP END03 00 0D
0x0009.WORD 500 00 05
0x000B.WORD 000 00 00
0x000DHLTFF 00 00

Notice how the addresses (09, 0B, 0D) were obtained from the symbol table built during the first pass.

8. Instruction‑Set Overview – Word Format

For the simplified processor used in the Cambridge syllabus the instruction word is 24 bits (3 bytes):

BitsFieldExplanation
23‑16Opcode (8 bits)Identifies the operation (e.g. 01 = LDA)
15‑0Operand (16 bits)Address, immediate constant, or register code, depending on addressing mode

Example: 01 00 09 → opcode = 01 (LDA), operand = 0x0009 (address of VALUE).

9. Instruction Groups (Syllabus Requirement)

GroupTypical MnemonicsPurpose
Data‑movementLDA, STA, LD, STTransfer data between registers and memory
ArithmeticADD, SUB, MUL, DIVPerform integer arithmetic
Logical / Bit‑manipulationAND, OR, XOR, NOT, SHL, SHRBoolean operations and shifts
Control‑flowJMP, JZ, JNZ, CALL, RETAlter the sequential execution order
Compare / TestCMP, TEST, TSTSet condition codes for subsequent branches

10. Addressing Modes

The assembler uses the symbol table to resolve the operand required by each mode.

ModeSyntax (example)What the CPU sees
ImmediateLDA #5Operand field contains the constant value 5
DirectLDA VALUEOperand field contains the absolute address of VALUE
IndirectLDA @PTRCPU fetches the address stored at PTR, then loads the value at that address
IndexedLDA ARRAY,XEffective address = address(ARRAY) + contents of register X
Relative (branch)JMP LABELOperand is a signed offset from the current PC; assembler computes offset using the symbol table

11. Macro Processing (Optional)

Macros are not required for the Cambridge exam but many assemblers support them. If used, macro expansion occurs during the first pass, before the symbol table is finalised.

MACRO INCR REG

ADD #1, REG

ENDM

START: INCR A ; expands to: ADD #1, A

HLT

During expansion the macro body is inserted into the source stream, and any labels defined inside the macro are made unique (often by appending a numeric suffix) to avoid collisions.

12. Error Handling

  • Duplicate label definitions – detected in the first pass; assembler reports the line numbers.
  • Undefined symbols – reported after the second pass when an operand cannot be found in the symbol table.
  • Illegal directive usage (e.g., .ORG to a non‑word boundary, mis‑aligned .ALIGN) – flagged in the pass where they appear.
  • Macro‑related errors – undefined macro name, recursive macro expansion, or label clashes inside a macro.

13. Further Reading – Bit Manipulation & Binary Shifts

Later in the syllabus (Topic 5) you will study how to implement bit‑wise operations and shifts directly in assembly. The op‑codes shown in the instruction‑set table (e.g., SHL, SHR, AND, OR) are the building blocks for tasks such as masking, setting, clearing, and rotating bits.

Suggested Diagram

Flowchart of the two‑pass assembly process

Two‑pass assembler flowchart