15.1 Processors, Parallel Processing and Virtual Machines
Learning objective
Show a thorough understanding of the characteristics of massively parallel computers (MPCs) and be able to relate this knowledge to the Cambridge International AS & A Level Computer Science (9618) syllabus (AO1–AO3).
1 Processor fundamentals (required background)
1.1 RISC vs CISC
RISC (Reduced Instruction Set Computer)
Small, highly‑optimised instruction set (≈ ≤ 32 instructions).
Most instructions complete in one clock cycle → easy to pipeline.
Typical examples: ARM Cortex‑A series, MIPS.
CISC (Complex Instruction Set Computer)
Large instruction set with many addressing modes.
Some instructions perform several low‑level operations (e.g., REP MOVS).
Typical examples: x86, VAX.
Modern CPUs blend both ideas; the distinction helps explain design trade‑offs that affect parallelism.
1.2 Pipelining
A pipeline splits the execution of an instruction into a sequence of stages. While one instruction is in stage 2, another can be in stage 1, allowing several instructions to be in‑flight simultaneously.
Stage
Typical operation
IF – Instruction fetch
Read next opcode from memory
ID – Decode & register fetch
Identify operation and source registers
EX – Execute / address calculation
ALU performs the operation
MEM – Memory access
Load/store data from/to memory
WB – Write‑back
Result written to destination register
Benefits: higher instruction throughput, better utilisation of functional units.
Limitations: data, control and structural hazards – mitigated by forwarding, stalls, branch prediction.
1.3 Registers
Fastest storage, located inside the CPU.
General‑purpose (e.g., R0‑R31) and special‑purpose (PC, status register).
Number of registers influences instruction encoding and the frequency of memory accesses.
2 Flynn’s taxonomy – the four basic computer architectures
Architecture
Instruction stream
Data stream
Typical example
SISD (Single Instruction, Single Data)
One
One
Classic sequential von Neumann machine
SIMD (Single Instruction, Multiple Data)
One
Many (parallel data elements)
GPU cores, vector processors
MISD (Multiple Instruction, Single Data)
Many
One
Rare – used in some fault‑tolerant redundant systems
An MPC is a system that contains **hundreds to millions of processing elements (PEs)**, each capable of executing instructions **concurrently**. PEs are usually arranged in a regular interconnection network and each PE has its own local memory.
3.2 Key characteristics
Very high degree of concurrency – thousands to millions of PEs active at once.
Fine‑grained parallelism – problems are broken into many tiny, independent sub‑tasks.
Scalable interconnection network – mesh, torus, hyper‑cube, fat‑tree, or custom topologies that grow with the number of PEs.
Distributed (local) memory – each PE stores its own data, reducing contention for a single shared memory.
Low power per PE – simple cores minimise energy consumption, allowing dense packing.
Fault tolerance – redundancy enables continued operation when individual PEs fail.
3.3 Performance metrics (relevant to AO2 – analyse problems in computational terms)
Metric
Formula / definition
Interpretation
Speed‑up (S)
S = T1 / Tp
How many times faster the parallel version runs compared with a single‑processor baseline.
Efficiency (E)
E = S / p
Proportion of the theoretical maximum utilisation of the p processors.
Scalability
Maintaining reasonable efficiency as p grows.
Indicates whether adding more PEs yields useful performance gains.
Throughput
Tasks completed per unit time (e.g., jobs / hour).
Important for data‑intensive services such as web‑service farms.
AO2 activity – speed‑up calculation
A climate‑model simulation takes 120 minutes on a single core. On a 1 024‑core MPC it finishes in 2 minutes.
Speed‑up: S = 120 / 2 = 60.
Efficiency: E = 60 / 1 024 ≈ 0.059 ≈ 5.9 %.
The low efficiency shows that the algorithm does not scale well – a point that must be discussed in AO2 analysis.
3.4 Internal architectures used in MPCs
SIMD – all PEs execute the same instruction on different data items. Typical in GPUs where thousands of “shader cores” perform the same arithmetic on separate pixels or matrix elements.
MIMD – each PE runs its own instruction stream. Used in large clusters, many‑core CPUs (e.g., Intel Xeon Phi) and in specialised super‑computers.
Hybrid SIMD/MIMD – a chip contains several SIMD “clusters” (streaming multiprocessors) that can be programmed independently, giving both data‑parallel and task‑parallel capabilities.
3.5 Programming models for MPCs
Message Passing Interface (MPI) – explicit send/receive between distributed PEs.
// MPI “Hello World” (C)
#include <mpi.h>
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("Hello from process %d", rank);
MPI_Finalize();
return 0;
}
Note: MPI is examined in the theory paper (Paper 2) but not in the practical (Paper 4).
A‑Level 15 – Architecture of modern computers: relationship between RISC/CISC, pipelining and the design of low‑power PEs used in MPCs.
4 Virtual machines (context for “virtualisation” in the syllabus)
4.1 Definition
A virtual machine (VM) is a software abstraction that emulates a complete computer system – CPU, memory, storage and I/O – allowing several isolated “machines” to run on a single physical host.
4.2 Types of VMs
System VMs (hypervisors) – e.g., VMware ESXi, Microsoft Hyper‑V, KVM. Provide a full hardware abstraction; each VM runs its own guest OS.
Process VMs – e.g., Java Virtual Machine (JVM), .NET CLR. Execute a single program written in a high‑level language and give platform independence.
Container‑style VMs – e.g., Docker. Share the host OS kernel but isolate processes, filesystems and network stacks.
4.3 Benefits
Portability – “write once, run anywhere” (especially for the JVM).
Isolation – faults or security breaches in one VM do not affect others.
Improved resource utilisation – multiple VMs share a single physical server.
Increased management complexity and licensing costs.
Potential security risk if the hypervisor is compromised.
4.5 Illustrative diagram – JVM runtime stack
Typical JVM stack: each method call creates a stack frame containing local variables, an operand stack and a reference to the constant pool.
5 Boolean algebra and logic circuits (foundation for hardware design)
5.1 Fundamental concepts
Boolean variables – 0 (false) or 1 (true).
Basic operators: AND (·), OR (+), NOT (‾).
Derived operators: NAND, NOR, XOR, XNOR.
5.2 Truth tables (example)
A
B
A AND B
A OR B
NOT A
0
0
0
0
1
0
1
0
1
1
1
0
0
1
0
1
1
1
1
0
5.3 Logic gates
Gate
Symbol
Boolean expression
Truth table (A B → Y)
AND
&
A·B
00→0, 01→0, 10→0, 11→1
OR
≥1
A+B
00→0, 01→1, 10→1, 11→1
NOT
⎯
‾A
0→1, 1→0
NAND
⨀
‾(A·B)
00→1, 01→1, 10→1, 11→0
NOR
⨁
‾(A+B)
00→1, 01→0, 10→0, 11→0
XOR
⊕
A⊕B
00→0, 01→1, 10→1, 11→0
6 Summary
Massively parallel computers consist of huge numbers of simple, low‑power PEs linked by a scalable network and equipped with local memory.
Key performance concepts – speed‑up, efficiency and scalability – allow students to analyse whether an algorithm is suitable for an MPC (AO2).
Understanding SIMD, MIMD and hybrid designs, together with the main programming models (MPI, OpenMP, CUDA/OpenCL, MapReduce), satisfies the AO1 requirement for “characteristics of massively parallel computers”.
Virtual machines and Boolean logic provide essential background for the broader syllabus topics on hardware architecture and software abstraction.
Your generous donation helps us continue providing free Cambridge IGCSE & A-Level resources,
past papers, syllabus notes, revision questions, and high-quality online tutoring to students across Kenya.