Cambridge Syllabus Notes

15.1 Processors, Parallel Processing and Virtual Machines

Learning objective

Show a thorough understanding of the characteristics of massively parallel computers (MPCs) and be able to relate this knowledge to the Cambridge International AS & A Level Computer Science (9618) syllabus (AO1–AO3).

1 Processor fundamentals (required background)

1.1 RISC vs CISC

RISC (Reduced Instruction Set Computer)
- Small, highly‑optimised instruction set (≈ ≤ 32 instructions).
- Most instructions complete in one clock cycle → easy to pipeline.
- Typical examples: ARM Cortex‑A series, MIPS.
CISC (Complex Instruction Set Computer)
- Large instruction set with many addressing modes.
- Some instructions perform several low‑level operations (e.g., REP MOVS).
- Typical examples: x86, VAX.
Modern CPUs blend both ideas; the distinction helps explain design trade‑offs that affect parallelism.

1.2 Pipelining

A pipeline splits the execution of an instruction into a sequence of stages. While one instruction is in stage 2, another can be in stage 1, allowing several instructions to be in‑flight simultaneously.

Stage	Typical operation
IF – Instruction fetch	Read next opcode from memory
ID – Decode & register fetch	Identify operation and source registers
EX – Execute / address calculation	ALU performs the operation
MEM – Memory access	Load/store data from/to memory
WB – Write‑back	Result written to destination register

Benefits: higher instruction throughput, better utilisation of functional units.
Limitations: data, control and structural hazards – mitigated by forwarding, stalls, branch prediction.

1.3 Registers

Fastest storage, located inside the CPU.
General‑purpose (e.g., R0‑R31) and special‑purpose (PC, status register).
Number of registers influences instruction encoding and the frequency of memory accesses.

2 Flynn’s taxonomy – the four basic computer architectures

Architecture	Instruction stream	Data stream	Typical example
SISD (Single Instruction, Single Data)	One	One	Classic sequential von Neumann machine
SIMD (Single Instruction, Multiple Data)	One	Many (parallel data elements)	GPU cores, vector processors
MISD (Multiple Instruction, Single Data)	Many	One	Rare – used in some fault‑tolerant redundant systems
MIMD (Multiple Instruction, Multiple Data)	Many	Many	Multi‑core CPUs, clusters, massively parallel computers

3 Massively Parallel Computers (MPCs)

3.1 Definition

An MPC is a system that contains **hundreds to millions of processing elements (PEs)**, each capable of executing instructions **concurrently**. PEs are usually arranged in a regular interconnection network and each PE has its own local memory.

3.2 Key characteristics

Very high degree of concurrency – thousands to millions of PEs active at once.
Fine‑grained parallelism – problems are broken into many tiny, independent sub‑tasks.
Scalable interconnection network – mesh, torus, hyper‑cube, fat‑tree, or custom topologies that grow with the number of PEs.
Distributed (local) memory – each PE stores its own data, reducing contention for a single shared memory.
Low power per PE – simple cores minimise energy consumption, allowing dense packing.
Fault tolerance – redundancy enables continued operation when individual PEs fail.

3.3 Performance metrics (relevant to AO2 – analyse problems in computational terms)

Metric	Formula / definition	Interpretation
Speed‑up (S)	S = T₁ / T_p	How many times faster the parallel version runs compared with a single‑processor baseline.
Efficiency (E)	E = S / p	Proportion of the theoretical maximum utilisation of the p processors.
Scalability	Maintaining reasonable efficiency as p grows.	Indicates whether adding more PEs yields useful performance gains.
Throughput	Tasks completed per unit time (e.g., jobs / hour).	Important for data‑intensive services such as web‑service farms.

AO2 activity – speed‑up calculation

A climate‑model simulation takes 120 minutes on a single core. On a 1 024‑core MPC it finishes in 2 minutes.

Speed‑up: S = 120 / 2 = 60.
Efficiency: E = 60 / 1 024 ≈ 0.059 ≈ 5.9 %.

The low efficiency shows that the algorithm does not scale well – a point that must be discussed in AO2 analysis.

3.4 Internal architectures used in MPCs

SIMD – all PEs execute the same instruction on different data items. Typical in GPUs where thousands of “shader cores” perform the same arithmetic on separate pixels or matrix elements.
MIMD – each PE runs its own instruction stream. Used in large clusters, many‑core CPUs (e.g., Intel Xeon Phi) and in specialised super‑computers.
Hybrid SIMD/MIMD – a chip contains several SIMD “clusters” (streaming multiprocessors) that can be programmed independently, giving both data‑parallel and task‑parallel capabilities.

3.5 Programming models for MPCs

Message Passing Interface (MPI) – explicit send/receive between distributed PEs.

// MPI “Hello World” (C)
#include <mpi.h>
int main(int argc, char **argv) {
    MPI_Init(&argc, &argv);
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    printf("Hello from process %d", rank);
    MPI_Finalize();
    return 0;
}

Note: MPI is examined in the theory paper (Paper 2) but not in the practical (Paper 4).

OpenMP – compiler directives for shared‑memory parallelism (e.g., #pragma omp parallel for).
CUDA / OpenCL – APIs for programming GPUs; expose thousands of SIMD cores.
MapReduce – functional model for processing very large data sets across many nodes (e.g., Hadoop).

3.6 Advantages of MPCs

Enable solutions to problems that are infeasible on serial machines (climate modelling, protein folding, AI training).
High energy efficiency per operation because each core is simple and low‑power.
Built‑in fault tolerance through redundancy.
Scalable performance – adding more PEs can increase throughput almost linearly, provided the algorithm scales.

3.7 Challenges & limitations

Programming complexity – algorithms must expose sufficient parallelism and minimise synchronisation.
Communication latency – data movement across the interconnect can dominate execution time.
Load balancing – uneven work distribution leads to idle PEs.
Memory bandwidth – contention for caches or shared links can become a bottleneck.
Cost – large numbers of cores and sophisticated networks increase capital expense.

3.8 Real‑world examples of MPCs

System	PE count	Primary use	Internal architecture
IBM Blue Gene/Q	1 048 576 cores	Scientific simulations (e.g., astrophysics)	MIMD, 5‑D torus interconnect
NVIDIA Tesla V100 GPU	5 120 CUDA cores	Deep learning, high‑performance computing	SIMD within SMs, MIMD across SMs
Google TPU v4 pod	4 096 specialised matrix cores per pod	TensorFlow workloads, AI training	Matrix‑multiply specialised units (SIMD‑like)

3.9 Suggested diagram

2‑D mesh interconnection network – each node represents a PE with local memory and routing links to its four neighbours.

3.10 Link to the Cambridge syllabus (AS & A‑Level)

AS 7 – Parallel processing: definition of MPCs, SIMD vs MIMD, interconnection networks, distributed memory.
AS 8 – Performance analysis: speed‑up, efficiency, Amdahl’s law (implicit in the efficiency discussion).
A‑Level 13 – Parallel algorithms: need to identify fine‑grained parallelism and discuss load‑balancing strategies.
A‑Level 14 – Programming for parallel systems: MPI, OpenMP, CUDA/OpenCL, MapReduce.
A‑Level 15 – Architecture of modern computers: relationship between RISC/CISC, pipelining and the design of low‑power PEs used in MPCs.

4 Virtual machines (context for “virtualisation” in the syllabus)

4.1 Definition

A virtual machine (VM) is a software abstraction that emulates a complete computer system – CPU, memory, storage and I/O – allowing several isolated “machines” to run on a single physical host.

4.2 Types of VMs

System VMs (hypervisors) – e.g., VMware ESXi, Microsoft Hyper‑V, KVM. Provide a full hardware abstraction; each VM runs its own guest OS.
Process VMs – e.g., Java Virtual Machine (JVM), .NET CLR. Execute a single program written in a high‑level language and give platform independence.
Container‑style VMs – e.g., Docker. Share the host OS kernel but isolate processes, filesystems and network stacks.

4.3 Benefits

Portability – “write once, run anywhere” (especially for the JVM).
Isolation – faults or security breaches in one VM do not affect others.
Improved resource utilisation – multiple VMs share a single physical server.

4.4 Limitations

Performance overhead (instruction translation, context switches).
Increased management complexity and licensing costs.
Potential security risk if the hypervisor is compromised.

4.5 Illustrative diagram – JVM runtime stack

Typical JVM stack: each method call creates a stack frame containing local variables, an operand stack and a reference to the constant pool.

5 Boolean algebra and logic circuits (foundation for hardware design)

5.1 Fundamental concepts

Boolean variables – 0 (false) or 1 (true).
Basic operators: AND (·), OR (+), NOT (‾).
Derived operators: NAND, NOR, XOR, XNOR.

5.2 Truth tables (example)

A	B	A AND B	A OR B	NOT A
0	0	0	0	1
0	1	0	1	1
1	0	0	1	0
1	1	1	1	0

5.3 Logic gates

Gate	Symbol	Boolean expression	Truth table (A B → Y)
AND	&	A·B	00→0, 01→0, 10→0, 11→1
OR	≥1	A+B	00→0, 01→1, 10→1, 11→1
NOT	⎯	‾A	0→1, 1→0
NAND	⨀	‾(A·B)	00→1, 01→1, 10→1, 11→0
NOR	⨁	‾(A+B)	00→1, 01→0, 10→0, 11→0
XOR	⊕	A⊕B	00→0, 01→1, 10→1, 11→0

6 Summary

Massively parallel computers consist of huge numbers of simple, low‑power PEs linked by a scalable network and equipped with local memory.
Key performance concepts – speed‑up, efficiency and scalability – allow students to analyse whether an algorithm is suitable for an MPC (AO2).
Understanding SIMD, MIMD and hybrid designs, together with the main programming models (MPI, OpenMP, CUDA/OpenCL, MapReduce), satisfies the AO1 requirement for “characteristics of massively parallel computers”.
Virtual machines and Boolean logic provide essential background for the broader syllabus topics on hardware architecture and software abstraction.

Show understanding of the characteristics of massively parallel computers

15.1 Processors, Parallel Processing and Virtual Machines

Learning objective

1 Processor fundamentals (required background)

1.1 RISC vs CISC

1.2 Pipelining

1.3 Registers

2 Flynn’s taxonomy – the four basic computer architectures

3 Massively Parallel Computers (MPCs)

3.1 Definition

3.2 Key characteristics

3.3 Performance metrics (relevant to AO2 – analyse problems in computational terms)

AO2 activity – speed‑up calculation

3.4 Internal architectures used in MPCs

3.5 Programming models for MPCs

3.6 Advantages of MPCs

3.7 Challenges & limitations

3.8 Real‑world examples of MPCs

3.9 Suggested diagram

3.10 Link to the Cambridge syllabus (AS & A‑Level)

4 Virtual machines (context for “virtualisation” in the syllabus)

4.1 Definition

4.2 Types of VMs

4.3 Benefits

4.4 Limitations

4.5 Illustrative diagram – JVM runtime stack

5 Boolean algebra and logic circuits (foundation for hardware design)

5.1 Fundamental concepts

5.2 Truth tables (example)

5.3 Logic gates

6 Summary