Cambridge Notes, Past Papers, Revision Questions

15.1 Processors, Parallel Processing & Virtual Machines

Learning objectives

Compare the main characteristics of RISC and CISC architectures.

Explain how a large register file and pipelining give RISC processors their high speed.

Identify the three classic pipeline hazards and describe at least one technique for removing each.

Compare the four Flynn classifications (SISD, SIMD, MISD, MIMD) and illustrate the difference between SISD and MIMD with a diagram.

Describe the key ideas behind massively‑parallel computers, including performance metrics such as speed‑up and Amdahl’s law.

Define a virtual machine, explain its layered structure and give two concrete examples of its use.

RISC vs CISC

Aspect	RISC (Reduced‑Instruction‑Set Computer)	CISC (Complex‑Instruction‑Set Computer)
Instruction‑set size	Small (≈ 50–100 instructions)	Large (hundreds to thousands)
Instruction length	Fixed (usually 32 bits)	Variable (1–15 bytes)
Execution time per instruction	One clock cycle (or a few) – most are simple	Many cycles – complex instructions are micro‑coded
Memory access model	Load/store only (memory accessed in dedicated stages)	Many instructions can read/write memory directly
Control unit	Hard‑wired, simple combinational logic	Micro‑programmed, more complex control memory
Register file	Large (≥ 32 general‑purpose registers) – keeps data on‑chip	Fewer registers; heavy reliance on memory operands
Typical design goal	High clock frequency + high instruction throughput	Rich instruction set for compact code density

Why RISC processors are fast

Large register file – most operands are in registers, avoiding costly memory accesses.

Simple, fixed‑length instructions – the decoder can be hard‑wired and the pipeline stages are uniform.

Pipelining – each instruction is broken into independent stages that operate in parallel, giving an average throughput of ≈ one instruction per clock after the pipeline is filled.

Five‑stage pipeline (classic RISC core)

Stage	Abbreviation	Typical operation	Registers read / written
Instruction Fetch	IF	Read the next 32‑bit word from instruction memory using the PC.	Read PC; write next PC into IF/ID pipeline register.
Instruction Decode / Register Fetch	ID	Decode opcode, read up to two source registers, sign‑extend immediates.	Read two source registers from the register file; write decoded fields into ID/EX register.
Execute / Address Calculation	EX	ALU operation, effective‑address calculation, branch‑condition evaluation.	Read ALU operands from ID/EX; write ALU result & branch decision into EX/MEM register.
Memory Access	MEM	Load from or store to data memory.	Read/write data memory; write loaded data into MEM/WB register.
Write Back	WB	Write the result (ALU output or loaded data) back to the destination register.	Write destination register (first half‑clock) from MEM/WB.

Typical 5‑stage pipeline diagram (showing overlapping instruction flow).
*(Insert diagram with arrows: IF → ID → EX → MEM → WB, each stage holding a different instruction.)*

How pipelining increases throughput

Each clock cycle completes part of several instructions simultaneously → after the pipeline fill latency, the average rate is one instruction per cycle.

No need to raise the clock frequency; the control logic stays simple and deterministic.

Fixed‑length stages give predictable timing – important for real‑time applications.

Pipeline hazards

Structural hazard – two stages need the same hardware resource (e.g., a single memory port).
Mitigation: duplicate the resource (separate instruction‑fetch and data‑memory ports) or insert a stall.

Data hazard – an instruction needs a result that has not yet been written back (RAW).
Mitigation: forwarding/bypassing from EX/MEM or MEM/WB registers; otherwise insert NOPs.

Control hazard – a branch or jump changes the PC before the outcome is known.
Mitigation: static prediction (e.g., “predict not taken”), dynamic branch prediction, branch delay slots, or pipeline flush.

Hazard‑mitigation techniques (quick reference)

Forwarding (bypassing) – the result produced in EX or MEM is routed directly to the EX stage of a dependent instruction.

Branch delay slot – the instruction immediately after a branch is always executed; the compiler can place a useful, independent instruction there.

Branch prediction – simple static (always not‑taken) or dynamic two‑bit saturating counters reduce mis‑prediction penalties.

Duplicated functional units – two ALUs, separate instruction‑fetch and data‑memory ports, or multiple load/store units eliminate structural conflicts.

Register file design for a pipelined RISC core

Typical size: 32 general‑purpose registers (GPRs), each 32 bits (or 64 bits for a 64‑bit core).

Two read ports – allow the ID stage to fetch two source operands in a single cycle.

One write port – used by the WB stage to write the result back.

Write‑after‑read (WAR) and read‑after‑write (RAW) conflicts are avoided by performing the write in the first half of the clock and the reads in the second half.

Optional third read port for special purposes (e.g., forwarding to the EX stage) in more advanced designs.

Example of forwarding

add \$t0, \$t1, \$t2 # \$t0 = \$t1 + \$t2

sub \$t3, \$t0, \$t4 # \$t3 = \$t0 - \$t4

and \$t5, \$t3, \$t6 # \$t5 = \$t3 & \$t6

Without forwarding the sub would have to wait until add reaches the WB stage (two‑cycle stall). With a bypass path from the EX/MEM register to the EX stage, the result of add is supplied directly to sub, eliminating the stall.

Four basic computer‑architecture classifications (Flynn’s taxonomy)

Class	Definition	Typical example
SISD	Single Instruction stream, Single Data stream – one processor executes one instruction at a time on one datum.	Traditional single‑core CPU (no SIMD extensions)
SIMD	Single Instruction stream, Multiple Data streams – one instruction operates on many data elements simultaneously.	GPU vector units, Intel SSE/AVX, ARM NEON
MISD	Multiple Instruction streams, Single Data stream – rare; used in specialised fault‑tolerant or pipeline‑parallel systems.	Theoretical models; some digital‑signal‑processing pipelines
MIMD	Multiple Instruction streams, Multiple Data streams – each processor runs its own program on its own data.	Multi‑core CPUs, clusters, distributed systems

Comparing SISD and MIMD

Data‑flow contrast (SISD vs. MIMD).

*(Insert two small block diagrams side‑by‑side)*

- SISD: one arrow from a single control unit to a single datapath.

- MIMD: several independent cores, each with its own control unit and datapath, connected to a shared memory hierarchy.

Massively‑parallel computers

Definition: Systems that contain hundreds, thousands or even millions of processing elements working concurrently.

Typical architectures
- SIMD GPUs – many simple cores executing the same instruction on different data.
- MIMD many‑core CPUs (e.g., Xeon Phi, ARM “big‑LITTLE” clusters).
- Distributed clusters – dozens to thousands of separate nodes linked by a network.

Key characteristics
- Large number of relatively simple cores.
- Hierarchical memory: fast on‑chip shared memory, slower global memory, and off‑chip DRAM.
- High memory‑bandwidth and low‑latency interconnects (e.g., NVLink, InfiniBand).
- Programming models: CUDA/OpenCL for GPUs, OpenMP/MPI for CPUs and clusters.

Performance metrics
- Speed‑up = T_serial / T_parallel.
- Amdahl’s law – the theoretical limit of speed‑up given the fraction of code that must remain serial.

Why it matters for A‑Level – exam questions often ask you to compare single‑core performance with parallel performance, or to identify parts of an algorithm that could be parallelised.

Virtual machines

Definition: A software abstraction that mimics a physical computer, providing its own instruction set, memory model and runtime environment.

Layered structure
1. Host operating system (provides hardware access).
2. Hypervisor or runtime (e.g., JVM, CLR, QEMU) – interprets or JIT‑compiles byte‑code.
3. Byte‑code (e.g., Java .class files, .NET MSIL, Python .pyc).
4. Native machine code generated at run‑time (JIT) or interpreted directly.

Typical uses
- Portability – “write once, run anywhere” (Java, .NET).
- Security isolation – sandboxing of untrusted code.
- Managed‑runtime services – garbage collection, runtime checks.
- Emulation – QEMU or Bochs emulate a different CPU architecture.
- Container‑style lightweight VMs – Docker uses a thin VM layer for isolation.

Examples
- Java Virtual Machine (JVM) – executes Java bytecode.
- .NET Common Language Runtime (CLR) – executes Microsoft Intermediate Language (MSIL).
- Python CPython interpreter – executes Python bytecode.
- QEMU – hardware‑level emulator that can run binaries compiled for a different ISA.

Key points for examination (15.1)

State two reasons why RISC processors can use a simple hard‑wired control unit.

Describe each of the five pipeline stages and indicate which registers are read or written in that stage.

Give one example of a structural, a data and a control hazard and name one technique that removes each.

Compare SISD and MIMD, supporting your answer with a short labelled diagram of data flow.

Explain the role of a virtual machine in providing platform independence and give two concrete examples of its use.

15.2 Boolean Algebra and Logic Circuits

Learning objectives

Construct truth tables for any Boolean expression.

Apply the fundamental Boolean‑algebra laws to simplify expressions.

Use Karnaugh maps to obtain minimal sum‑of‑products (SOP) or product‑of‑sums (POS) forms.

Design combinational circuits (half‑adder, full‑adder, multiplexers, decoders) using basic gates.

Understand the operation of basic sequential elements (SR, D, JK, T flip‑flops) and how they are combined to form registers and counters.

Logic‑gate symbols and truth tables

Gate	Symbol	Boolean function	Truth table (A B → Y)
AND	∧	A·B	0 0 → 0 0 1 → 0 1 0 → 0 1 1 → 1
OR	∨	A+ B	0 0 → 0 0 1 → 1 1 0 → 1 1 1 → 1
NOT	¬	Ā	0 → 1 1 → 0
NAND	(AND)¬	Ā·B̅	0 0 → 1 0 1 → 1 1 0 → 1 1 1 → 0
NOR	(OR)¬	Ā+ B̅	0 0 → 1 0 1 → 0 1 0 → 0 1 1 → 0
XOR	⊕	A⊕B	0 0 → 0 0 1 → 1 1 0 → 1 1 1 → 0
XNOR	¬⊕	A⊙B	0 0 → 1 0 1 → 0 1 0 → 0 1 1 → 1

Fundamental Boolean‑algebra laws

Identity: A+0 = A A·1 = A

Null: A+1 = 1 A·0 = 0

Idempotent: A+ A = A A·A = A

Complement: A+Ā = 1 A·Ā = 0

Commutative: A+ B = B+ A A·B = B·A

Associative: (A+ B)+ C = A+ (B+ C) (A·B)·C = A·(B·C)

Distributive: A·(B+ C) = A·B + A·C A+ (B·C) = (A+ B)·(A+ C)

De Morgan: Ā·B̅ = (A+ B)̅ Ā+ B̅ = (A·B)̅

Simplifying an expression – worked example

Given F = A·B + A·Ā·C + B·C

Apply the absorption law: A·Ā·C = 0, so it disappears.

Expression becomes F = A·B + B·C.

Factor B: F = B·(A + C). This is the minimal SOP form.

Karnaugh‑map minimisation (3‑variable example)

Minimise G = Σm(1,2,3,5,7) (i.e., true for minterms 1,2,3,5,7).


BC
00 01 11 10
+----------------
A0 |  0  1  1  1   ← row A=0
A1 |  0  1  1  0   ← row A=1

Group the four 1’s in the centre (covering cells 2,3,6,7) → term B.

Group the remaining 1 at cell 1 (A=0, B=0, C=1) → term Ā·C.

Minimal SOP: G = B + Ā·C.

Designing combinational circuits

Half‑adder

Adds two single‑bit numbers.

A	B	Sum (S)	Carry (C)
0	0	0	0
0	1	1	0
1	0	1	0
1	1	0	1

Sum = A ⊕ B ( XOR gate )

Carry = A·B ( AND gate )

Full‑adder

Adds three bits: A, B and Carry‑in (C_in).

A	B	C_in	Sum	C_out
0	0	0	0	0
0	0	1	1	0
0	1	0	1	0
0	1	1	0	1
1	0	0	1	0
1	0	1	0	1
1	1	0	0	1
1	1	1	1	1

Sum = A ⊕ B ⊕ C_in

C_out = (A·B) + (B·C_in) + (A·C_in) (majority function)

Multiplexer (2‑to‑1)

Selects one of two data inputs (D0, D1) based on select line S.

Output Y = (S̅·D0) + (S·D1).

Implemented with two AND gates, one OR gate and an inverter for S̅.

Decoder (2‑to‑4)

Activates exactly one of four outputs (Y0–Y3) according to a 2‑bit binary input (A1A0).

Y0 = Ā1·Ā0 Y1 = Ā1·A0 Y2 = A1·Ā0 Y3 = A1·A0

Implemented with four three‑input AND gates (including the necessary inversions).

Sequential logic – flip‑flops

SR latch (asynchronous)

S	R	Q (next)
0	0	Q (no change)
0	1	0
1	0	1
1	1	Invalid (both 0)

Used as a building block for other flip‑flops.

D flip‑flop (edge‑triggered)

Data input D is sampled on a clock edge (rising or falling).

Q_next = D at the moment of the clock transition.

Eliminates the illegal state of the SR latch.

JK flip‑flop

Inputs J and K behave like S and R, but the “both 1” condition toggles the output.

Truth table:


J K | Qnext
0 0 | Q (no change)
0 1 | 0
1 0 | 1
1 1 | ¬Q (toggle)

T flip‑flop

Single input T (Toggle).

Qnext = ¬Q when T = 1; otherwise Qnext = Q.

Often built from a JK flip‑flop with J = K = T.

Registers and counters

Register – a group of D flip‑flops (one per bit) with a common clock; provides a fast on‑chip storage element.

Shift register – registers connected serially; used for serial‑to‑parallel conversion.

Binary counter – cascade of T or JK flip‑flops; each stage toggles on the carry from the previous stage.

Example: 4‑bit synchronous binary up‑counter


Q0 (LSB) – T‑FF with T = 1
Q1 – T‑FF with T = Q0
Q2 – T‑FF with T = Q0·Q1
Q3 – T‑FF with T = Q0·Q1·Q2
All flip‑flops share the same clock edge.

The counter advances by one on each clock pulse; the carry‑logic (AND gates) implements the “toggle when all lower bits are 1” rule.

Registers in a pipelined RISC core

Each pipeline stage is separated by a set of pipeline registers (IF/ID, ID/EX, EX/MEM, MEM/WB).

These registers hold the instruction, control signals and intermediate data, ensuring that each stage works on a stable snapshot of the instruction.

They are essentially small, fast registers built from D flip‑flops, clocked by the same global clock as the processor.

Key points for examination (15.2)

Construct the truth table for a given Boolean expression (e.g., (A+ B)·Ā).

Show, step‑by‑step, how an expression is reduced using Boolean laws.

Use a Karnaugh map to minimise a 3‑ or 4‑variable function and write the resulting SOP or POS.

Draw a half‑adder and a full‑adder circuit using only basic gates.

Explain how a D flip‑flop works, and how several D flip‑flops are combined to form a register used in a pipelined processor.

Show understanding of the importance/use of pipelining and registers in RISC processors

15.1 Processors, Parallel Processing & Virtual Machines

Learning objectives

RISC vs CISC

Why RISC processors are fast

Five‑stage pipeline (classic RISC core)

How pipelining increases throughput

Pipeline hazards

Hazard‑mitigation techniques (quick reference)

Register file design for a pipelined RISC core

Example of forwarding

Four basic computer‑architecture classifications (Flynn’s taxonomy)

Comparing SISD and MIMD

Massively‑parallel computers

Virtual machines

Key points for examination (15.1)

15.2 Boolean Algebra and Logic Circuits

Learning objectives

Logic‑gate symbols and truth tables

Fundamental Boolean‑algebra laws

Simplifying an expression – worked example

Karnaugh‑map minimisation (3‑variable example)

Designing combinational circuits

Half‑adder

Full‑adder

Multiplexer (2‑to‑1)

Decoder (2‑to‑4)

Sequential logic – flip‑flops

SR latch (asynchronous)

D flip‑flop (edge‑triggered)

JK flip‑flop

T flip‑flop

Registers and counters

Example: 4‑bit synchronous binary up‑counter

Registers in a pipelined RISC core

Key points for examination (15.2)

15.1 Processors, Parallel Processing & Virtual Machines

RISC vs CISC

15.2 Boolean Algebra and Logic Circuits