Compare the main characteristics of RISC and CISC architectures.
Explain how a large register file and pipelining give RISC processors their high speed.
Identify the three classic pipeline hazards and describe at least one technique for removing each.
Compare the four Flynn classifications (SISD, SIMD, MISD, MIMD) and illustrate the difference between SISD and MIMD with a diagram.
Describe the key ideas behind massively‑parallel computers, including performance metrics such as speed‑up and Amdahl’s law.
Define a virtual machine, explain its layered structure and give two concrete examples of its use.
RISC vs CISC
Aspect
RISC (Reduced‑Instruction‑Set Computer)
CISC (Complex‑Instruction‑Set Computer)
Instruction‑set size
Small (≈ 50–100 instructions)
Large (hundreds to thousands)
Instruction length
Fixed (usually 32 bits)
Variable (1–15 bytes)
Execution time per instruction
One clock cycle (or a few) – most are simple
Many cycles – complex instructions are micro‑coded
Memory access model
Load/store only (memory accessed in dedicated stages)
Many instructions can read/write memory directly
Control unit
Hard‑wired, simple combinational logic
Micro‑programmed, more complex control memory
Register file
Large (≥ 32 general‑purpose registers) – keeps data on‑chip
Fewer registers; heavy reliance on memory operands
Typical design goal
High clock frequency + high instruction throughput
Rich instruction set for compact code density
Why RISC processors are fast
Large register file – most operands are in registers, avoiding costly memory accesses.
Simple, fixed‑length instructions – the decoder can be hard‑wired and the pipeline stages are uniform.
Pipelining – each instruction is broken into independent stages that operate in parallel, giving an average throughput of ≈ one instruction per clock after the pipeline is filled.
Five‑stage pipeline (classic RISC core)
Stage
Abbreviation
Typical operation
Registers read / written
Instruction Fetch
IF
Read the next 32‑bit word from instruction memory using the PC.
Read PC; write next PC into IF/ID pipeline register.
Instruction Decode / Register Fetch
ID
Decode opcode, read up to two source registers, sign‑extend immediates.
Read two source registers from the register file; write decoded fields into ID/EX register.
Execute / Address Calculation
EX
ALU operation, effective‑address calculation, branch‑condition evaluation.
Read ALU operands from ID/EX; write ALU result & branch decision into EX/MEM register.
Memory Access
MEM
Load from or store to data memory.
Read/write data memory; write loaded data into MEM/WB register.
Write Back
WB
Write the result (ALU output or loaded data) back to the destination register.
Write destination register (first half‑clock) from MEM/WB.
Typical 5‑stage pipeline diagram (showing overlapping instruction flow). *(Insert diagram with arrows: IF → ID → EX → MEM → WB, each stage holding a different instruction.)*
How pipelining increases throughput
Each clock cycle completes part of several instructions simultaneously → after the pipeline fill latency, the average rate is one instruction per cycle.
No need to raise the clock frequency; the control logic stays simple and deterministic.
Fixed‑length stages give predictable timing – important for real‑time applications.
Pipeline hazards
Structural hazard – two stages need the same hardware resource (e.g., a single memory port).
Mitigation: duplicate the resource (separate instruction‑fetch and data‑memory ports) or insert a stall.
Data hazard – an instruction needs a result that has not yet been written back (RAW).
Mitigation: forwarding/bypassing from EX/MEM or MEM/WB registers; otherwise insert NOPs.
Control hazard – a branch or jump changes the PC before the outcome is known.
Mitigation: static prediction (e.g., “predict not taken”), dynamic branch prediction, branch delay slots, or pipeline flush.
Hazard‑mitigation techniques (quick reference)
Forwarding (bypassing) – the result produced in EX or MEM is routed directly to the EX stage of a dependent instruction.
Branch delay slot – the instruction immediately after a branch is always executed; the compiler can place a useful, independent instruction there.
Duplicated functional units – two ALUs, separate instruction‑fetch and data‑memory ports, or multiple load/store units eliminate structural conflicts.
Register file design for a pipelined RISC core
Typical size: 32 general‑purpose registers (GPRs), each 32 bits (or 64 bits for a 64‑bit core).
Two read ports – allow the ID stage to fetch two source operands in a single cycle.
One write port – used by the WB stage to write the result back.
Write‑after‑read (WAR) and read‑after‑write (RAW) conflicts are avoided by performing the write in the first half of the clock and the reads in the second half.
Optional third read port for special purposes (e.g., forwarding to the EX stage) in more advanced designs.
Example of forwarding
add \$t0, \$t1, \$t2 # \$t0 = \$t1 + \$t2
sub \$t3, \$t0, \$t4 # \$t3 = \$t0 - \$t4
and \$t5, \$t3, \$t6 # \$t5 = \$t3 & \$t6
Without forwarding the sub would have to wait until add reaches the WB stage (two‑cycle stall). With a bypass path from the EX/MEM register to the EX stage, the result of add is supplied directly to sub, eliminating the stall.
Four basic computer‑architecture classifications (Flynn’s taxonomy)
Class
Definition
Typical example
SISD
Single Instruction stream, Single Data stream – one processor executes one instruction at a time on one datum.
Traditional single‑core CPU (no SIMD extensions)
SIMD
Single Instruction stream, Multiple Data streams – one instruction operates on many data elements simultaneously.
GPU vector units, Intel SSE/AVX, ARM NEON
MISD
Multiple Instruction streams, Single Data stream – rare; used in specialised fault‑tolerant or pipeline‑parallel systems.
Theoretical models; some digital‑signal‑processing pipelines
MIMD
Multiple Instruction streams, Multiple Data streams – each processor runs its own program on its own data.
Multi‑core CPUs, clusters, distributed systems
Comparing SISD and MIMD
Data‑flow contrast (SISD vs. MIMD).
*(Insert two small block diagrams side‑by‑side)*
- SISD: one arrow from a single control unit to a single datapath.
- MIMD: several independent cores, each with its own control unit and datapath, connected to a shared memory hierarchy.
Massively‑parallel computers
Definition: Systems that contain hundreds, thousands or even millions of processing elements working concurrently.
Typical architectures
SIMD GPUs – many simple cores executing the same instruction on different data.
MIMD many‑core CPUs (e.g., Xeon Phi, ARM “big‑LITTLE” clusters).
Distributed clusters – dozens to thousands of separate nodes linked by a network.
Key characteristics
Large number of relatively simple cores.
Hierarchical memory: fast on‑chip shared memory, slower global memory, and off‑chip DRAM.
High memory‑bandwidth and low‑latency interconnects (e.g., NVLink, InfiniBand).
Programming models: CUDA/OpenCL for GPUs, OpenMP/MPI for CPUs and clusters.
Performance metrics
Speed‑up = Tserial / Tparallel.
Amdahl’s law – the theoretical limit of speed‑up given the fraction of code that must remain serial.
Why it matters for A‑Level – exam questions often ask you to compare single‑core performance with parallel performance, or to identify parts of an algorithm that could be parallelised.
Virtual machines
Definition: A software abstraction that mimics a physical computer, providing its own instruction set, memory model and runtime environment.
Layered structure
Host operating system (provides hardware access).
Hypervisor or runtime (e.g., JVM, CLR, QEMU) – interprets or JIT‑compiles byte‑code.
Selects one of two data inputs (D0, D1) based on select line S.
Output Y = (S̅·D0) + (S·D1).
Implemented with two AND gates, one OR gate and an inverter for S̅.
Decoder (2‑to‑4)
Activates exactly one of four outputs (Y0–Y3) according to a 2‑bit binary input (A1A0).
Y0 = Ā1·Ā0 Y1 = Ā1·A0 Y2 = A1·Ā0 Y3 = A1·A0
Implemented with four three‑input AND gates (including the necessary inversions).
Sequential logic – flip‑flops
SR latch (asynchronous)
S
R
Q (next)
0
0
Q (no change)
0
1
0
1
0
1
1
1
Invalid (both 0)
Used as a building block for other flip‑flops.
D flip‑flop (edge‑triggered)
Data input D is sampled on a clock edge (rising or falling).
Qnext = D at the moment of the clock transition.
Eliminates the illegal state of the SR latch.
JK flip‑flop
Inputs J and K behave like S and R, but the “both 1” condition toggles the output.
Truth table:
J K | Qnext
0 0 | Q (no change)
0 1 | 0
1 0 | 1
1 1 | ¬Q (toggle)
T flip‑flop
Single input T (Toggle).
Qnext = ¬Q when T = 1; otherwise Qnext = Q.
Often built from a JK flip‑flop with J = K = T.
Registers and counters
Register – a group of D flip‑flops (one per bit) with a common clock; provides a fast on‑chip storage element.
Shift register – registers connected serially; used for serial‑to‑parallel conversion.
Binary counter – cascade of T or JK flip‑flops; each stage toggles on the carry from the previous stage.
Example: 4‑bit synchronous binary up‑counter
Q0 (LSB) – T‑FF with T = 1
Q1 – T‑FF with T = Q0
Q2 – T‑FF with T = Q0·Q1
Q3 – T‑FF with T = Q0·Q1·Q2
All flip‑flops share the same clock edge.
The counter advances by one on each clock pulse; the carry‑logic (AND gates) implements the “toggle when all lower bits are 1” rule.
Registers in a pipelined RISC core
Each pipeline stage is separated by a set of pipeline registers (IF/ID, ID/EX, EX/MEM, MEM/WB).
These registers hold the instruction, control signals and intermediate data, ensuring that each stage works on a stable snapshot of the instruction.
They are essentially small, fast registers built from D flip‑flops, clocked by the same global clock as the processor.
Key points for examination (15.2)
Construct the truth table for a given Boolean expression (e.g., (A+ B)·Ā).
Show, step‑by‑step, how an expression is reduced using Boolean laws.
Use a Karnaugh map to minimise a 3‑ or 4‑variable function and write the resulting SOP or POS.
Draw a half‑adder and a full‑adder circuit using only basic gates.
Explain how a D flip‑flop works, and how several D flip‑flops are combined to form a register used in a pipelined processor.
Support e-Consult Kenya
Your generous donation helps us continue providing free Cambridge IGCSE & A-Level resources,
past papers, syllabus notes, revision questions, and high-quality online tutoring to students across Kenya.