Show understanding of the need for: assembler software for the translation of an assembly language program

5.2 Language Translators – Assembler Software

Why an Assembler Is Needed

Assembly language is a symbolic, human‑readable form of a computer’s machine instructions.

The CPU can only execute machine code (binary op‑codes), so an assembler must translate the mnemonics into the binary format required by the target architecture.

  • Human readability: Mnemonics such as MOV, ADD and symbolic labels replace long strings of 0s and 1s.
  • Maintainability: Labels, comments and macro facilities make programmes easier to modify and debug.
  • Portability of source: The same assembly source can be assembled for different CPUs by using the appropriate assembler.
  • Error detection: Assemblers spot syntax errors, undefined labels and illegal operand combinations before the programme is run.

CPU Fundamentals (Sidebar)

Two‑Pass Assembler Process

Most A‑Level assemblers operate in two passes. The flow‑chart (see figure at the end) summarises the stages.

  1. Pass 1 – Symbol‑table construction

    • The source file is scanned line‑by‑line.
    • Every label is entered into a symbol table together with its *tentative* address (the location counter at that point).
    • Only the length of each instruction is calculated; no op‑codes are generated yet.

  2. Pass 2 – Opcode generation & address resolution

    • The source is read a second time.
    • Each mnemonic is looked up in the instruction‑set table and the corresponding binary opcode is emitted.
    • Operands are encoded using the addresses stored in the symbol table (or immediate values).
    • Relocation records are created for any address that may change when the programme is linked with other modules.

Concrete Symbol‑Table Example

Source (first pass)

0010: START: LDM #5

0012: ADD ONE

0014: STA RESULT

0016: HLT

0018: ONE: DB 0x01

0019: RESULT: DB 0x00

After Pass 1 the assembler has built the following table:

LabelAddress (hex)
START10
ONE18
RESULT19

During Pass 2 the forward‑referenced labels ONE and RESULT are replaced by the addresses 18h and 19h respectively.

One‑Pass vs. Two‑Pass Assemblers (and Macro Facilities)

FeatureOne‑Pass AssemblerTwo‑Pass Assembler
Symbol handlingLabels must be defined before they are used (no forward references).Labels may be forward‑referenced; the first pass records them.
Complexity & speedSimpler and faster, but limited in capability.More complex, requires two scans, but supports full language features.
Macro supportRarely included.Common; directives such as MACRO … ENDM expand reusable code blocks before Pass 2.

Macro Directives (optional for A‑Level)

Macros allow a programmer to define a short name for a frequently used sequence of instructions.

MACRO INCX

LDA X

ADD #1

STA X

ENDM

; later in the program

INCX ; expands to the three instructions above

During the macro‑expansion phase the assembler replaces each macro call with its body, then proceeds with the normal two‑pass assembly.

Addressing Modes (Cambridge Syllabus Requirement)

The syllabus expects recognition of the four basic modes. The table uses a hypothetical 8‑bit processor.

ModeDescriptionExample (mnemonic operand)
ImmediateOperand is a constant encoded directly in the instruction.LDA #5
DirectOperand is a memory address that contains the data.LDA VALUE
IndirectOperand is a memory address that holds the address of the data.LDA @PTR
IndexedEffective address = base address + contents of an index register (e.g., X).LDA ARRAY,X

Typical Instruction‑Set Example (Cambridge Style)

Opcode (hex)MnemonicOperand typeOperation
01LDMImmediateLoad accumulator with a constant.
02LDDDirectLoad accumulator from a memory address.
03ADDDirect/ImmediateAdd operand to accumulator.
04SUBDirect/ImmediateSubtract operand from accumulator.
05STADirectStore accumulator to a memory address.
06JMPDirectUnconditional jump to address.
07CMPDirect/ImmediateCompare operand with accumulator (sets status flags).
FFHLTNoneHalt the processor.

Example Translation (Two‑Pass Assembly)

START: LDM #5 ; Load accumulator with decimal 5

ADD ONE ; Add constant ONE

STA RESULT ; Store the sum

HLT ; Stop execution

ONE: DB 0x01 ; constant ONE = 1 (hex)

RESULT: DB 0x00 ; reserve a byte for the result

After the two passes the object file (shown as hexadecimal bytes) might look like:

00: 01 05 ; LDM #5 (opcode 01, immediate 05h)

02: 03 18 ; ADD ONE (opcode 03, address of ONE = 18h)

04: 05 19 ; STA RESULT (opcode 05, address of RESULT = 19h)

06: FF ; HLT

18: 01 ; ONE = 0x01

19: 00 ; RESULT (initialised to 0)

The assembler resolved the symbolic labels ONE and RESULT to actual memory addresses (18h and 19h) during Pass 2.

Object File, Relocation & Linking

  • Object file: Binary file produced by the assembler containing machine code and a table of relocation records.
  • Relocation records: Indicate which addresses may need adjustment when the program is linked with other modules.
  • Linking: One or more object files are combined to form a single executable. The linker uses the relocation records to fix up addresses and to merge symbol tables.

Error Detection & Typical Diagnostics

Assemblers report errors during both passes. Two common examples are:

  • Undefined label: Error: label LOOP not defined. – occurs when a label is referenced but never declared.
  • Operand‑size mismatch: Error: immediate value out of range for opcode LDM. – occurs when a constant does not fit the operand field.

Most assemblers display the line number and a short description, allowing the programmer to correct the source before execution.

Language Translators – Assemblers, Compilers & Interpreters (Comparison)

TranslatorTypical Input LanguageOutputKey AdvantagesKey Disadvantages
AssemblerAssembly language (mnemonics + directives)Object code (machine code + relocation info)Very close to hardware → fine‑grained control; fast execution; easy to generate interrupt‑service routines.Programmer must manage registers, memory layout and low‑level details.
CompilerHigh‑level language (e.g., C, Java)Object code (or byte‑code for languages like Java)Abstraction → faster development, portability, automatic optimisation.Less direct control of hardware; compilation can be time‑consuming.
InterpreterScripting language (e.g., Python) or byte‑code (Java VM)Direct execution by an interpreter or virtual machineImmediate feedback, platform independence, easy debugging.Generally slower execution; runtime errors may appear later.

Understanding the role of each translator helps students appreciate why assembly language is still taught at A‑Level despite the prevalence of high‑level languages.

Assessment Objectives (AO) Mapping

AOWhat is assessed in this topic
AO1Knowledge of the purpose of an assembler, the two‑pass process, macro directives, CPU registers, fetch‑execute cycle, and basic addressing modes.
AO2Analysis of how assembly language is converted into machine code: symbol‑table construction, forward‑reference resolution, opcode generation, relocation, and comparison with compilers/interpreters.

Flowchart of the two‑pass assembler

Pass 1 – Scan source → build Symbol Table → compute instruction lengths

Pass 2 – Translate mnemonics → generate op‑codes → resolve addresses → write Object file (+ relocation records)