Cambridge A-Level Computer Science 9618 – Language Translators: Assembler Software
5.2 Language Translators – Assembler Software
Why an Assembler Is Needed
Assembly language provides a symbolic representation of a computer’s machine instructions.
While it is much easier for humans to read and write than raw binary, the CPU can only
execute machine code. An assembler bridges this gap by translating each assembly
instruction into its corresponding binary opcode and arranging data in the format required
by the target architecture.
The need for an assembler arises from several practical considerations:
Human readability: Programmers can use mnemonic codes (e.g., MOV, ADD) and symbolic addresses instead of long binary strings.
Maintainability: Symbolic labels make code easier to modify and debug.
Portability of source: The same assembly source can be assembled for different machines by using a different assembler.
Error detection: Assemblers can check for syntax errors, undefined symbols, and illegal operand combinations before execution.
How an Assembler Works
The translation process typically follows these stages:
Lexical analysis: The source file is read and split into tokens (mnemonics, registers, constants, labels).
Symbol table construction: All labels are recorded with their intended addresses.
Opcode generation: Each mnemonic is mapped to its binary opcode; operands are encoded according to the instruction format.
Address resolution: Symbolic addresses are replaced by actual memory addresses or offsets.
Object code output: The final machine code is written to an object file, often accompanied by relocation information.
Typical Output of an Assembler
The assembler produces an object file that contains:
Machine code (binary opcodes).
Relocation records for addresses that may change when linked.
Symbol tables for debugging (optional).
Example Translation
Consider a simple program for a hypothetical 8‑bit processor:
START: LDA \cdot ALUE ; Load accumulator with \cdot ALUE
ADD ONE ; Add constant ONE
STA RESULT ; Store result
HLT ; Halt \cdot ALUE: DB 0x05
ONE: DB 0x01
RESULT: DB 0x00
After assembly, the object code might look like:
00: 01 06 ; LDA address 0x06 (VALUE)
02: 02 07 ; ADD address 0x07 (ONE)
04: 03 08 ; STA address 0x08 (RESULT)
06: FF ; HLT
07: 05 ; VALUE = 0x05
08: 01 ; ONE = 0x01
09: 00 ; RESULT (initialised to 0)
Comparison with Other Translators
Translator
Input Language
Output Language
Typical Use
Assembler
Assembly language
Machine code (object file)
Low‑level system programming, device drivers
Compiler
High‑level language (e.g., Java, C++)
Machine code or intermediate bytecode
Application development
Interpreter
High‑level language (e.g., Python)
Direct execution of statements
Scripting, rapid prototyping
Key Points to Remember
An assembler converts human‑readable mnemonics into binary opcodes that the CPU can execute.
It resolves symbolic addresses, manages a symbol table, and can detect many programming errors early.
The output is usually an object file that may later be linked with other modules.
Understanding the assembler’s role helps explain why low‑level programming still requires a translation step, even though the source is close to the hardware.
Suggested diagram: Flowchart showing the stages of assembly – from source code, through lexical analysis, symbol table creation, opcode generation, address resolution, to object code output.