Show understanding of the characteristics of massively parallel computers

15.1 Processors, Parallel Processing and Virtual Machines

Learning objective

Show a thorough understanding of the characteristics of massively parallel computers (MPCs) and be able to relate this knowledge to the Cambridge International AS & A Level Computer Science (9618) syllabus (AO1–AO3).


1 Processor fundamentals (required background)

1.1 RISC vs CISC

  • RISC (Reduced Instruction Set Computer)
    • Small, highly‑optimised instruction set (≈ ≤ 32 instructions).
    • Most instructions complete in one clock cycle → easy to pipeline.
    • Typical examples: ARM Cortex‑A series, MIPS.
  • CISC (Complex Instruction Set Computer)
    • Large instruction set with many addressing modes.
    • Some instructions perform several low‑level operations (e.g., REP MOVS).
    • Typical examples: x86, VAX.
  • Modern CPUs blend both ideas; the distinction helps explain design trade‑offs that affect parallelism.

1.2 Pipelining

A pipeline splits the execution of an instruction into a sequence of stages. While one instruction is in stage 2, another can be in stage 1, allowing several instructions to be in‑flight simultaneously.

StageTypical operation
IF – Instruction fetchRead next opcode from memory
ID – Decode & register fetchIdentify operation and source registers
EX – Execute / address calculationALU performs the operation
MEM – Memory accessLoad/store data from/to memory
WB – Write‑backResult written to destination register

Benefits: higher instruction throughput, better utilisation of functional units.
Limitations: data, control and structural hazards – mitigated by forwarding, stalls, branch prediction.

1.3 Registers

  • Fastest storage, located inside the CPU.
  • General‑purpose (e.g., R0‑R31) and special‑purpose (PC, status register).
  • Number of registers influences instruction encoding and the frequency of memory accesses.

2 Flynn’s taxonomy – the four basic computer architectures

ArchitectureInstruction streamData streamTypical example
SISD (Single Instruction, Single Data) One One Classic sequential von Neumann machine
SIMD (Single Instruction, Multiple Data) One Many (parallel data elements) GPU cores, vector processors
MISD (Multiple Instruction, Single Data) Many One Rare – used in some fault‑tolerant redundant systems
MIMD (Multiple Instruction, Multiple Data) Many Many Multi‑core CPUs, clusters, massively parallel computers

3 Massively Parallel Computers (MPCs)

3.1 Definition

An MPC is a system that contains **hundreds to millions of processing elements (PEs)**, each capable of executing instructions **concurrently**. PEs are usually arranged in a regular interconnection network and each PE has its own local memory.

3.2 Key characteristics

  • Very high degree of concurrency – thousands to millions of PEs active at once.
  • Fine‑grained parallelism – problems are broken into many tiny, independent sub‑tasks.
  • Scalable interconnection network – mesh, torus, hyper‑cube, fat‑tree, or custom topologies that grow with the number of PEs.
  • Distributed (local) memory – each PE stores its own data, reducing contention for a single shared memory.
  • Low power per PE – simple cores minimise energy consumption, allowing dense packing.
  • Fault tolerance – redundancy enables continued operation when individual PEs fail.

3.3 Performance metrics (relevant to AO2 – analyse problems in computational terms)

MetricFormula / definitionInterpretation
Speed‑up (S) S = T1 / Tp How many times faster the parallel version runs compared with a single‑processor baseline.
Efficiency (E) E = S / p Proportion of the theoretical maximum utilisation of the p processors.
Scalability Maintaining reasonable efficiency as p grows. Indicates whether adding more PEs yields useful performance gains.
Throughput Tasks completed per unit time (e.g., jobs / hour). Important for data‑intensive services such as web‑service farms.
AO2 activity – speed‑up calculation

A climate‑model simulation takes 120 minutes on a single core. On a 1 024‑core MPC it finishes in 2 minutes.

  • Speed‑up: S = 120 / 2 = 60.
  • Efficiency: E = 60 / 1 024 ≈ 0.059 ≈ 5.9 %.

The low efficiency shows that the algorithm does not scale well – a point that must be discussed in AO2 analysis.

3.4 Internal architectures used in MPCs

  1. SIMD – all PEs execute the same instruction on different data items. Typical in GPUs where thousands of “shader cores” perform the same arithmetic on separate pixels or matrix elements.
  2. MIMD – each PE runs its own instruction stream. Used in large clusters, many‑core CPUs (e.g., Intel Xeon Phi) and in specialised super‑computers.
  3. Hybrid SIMD/MIMD – a chip contains several SIMD “clusters” (streaming multiprocessors) that can be programmed independently, giving both data‑parallel and task‑parallel capabilities.

3.5 Programming models for MPCs

  • Message Passing Interface (MPI) – explicit send/receive between distributed PEs.
    // MPI “Hello World” (C)
    #include <mpi.h>
    int main(int argc, char **argv) {
        MPI_Init(&argc, &argv);
        int rank;
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        printf("Hello from process %d", rank);
        MPI_Finalize();
        return 0;
    }
    Note: MPI is examined in the theory paper (Paper 2) but not in the practical (Paper 4).
  • OpenMP – compiler directives for shared‑memory parallelism (e.g., #pragma omp parallel for).
  • CUDA / OpenCL – APIs for programming GPUs; expose thousands of SIMD cores.
  • MapReduce – functional model for processing very large data sets across many nodes (e.g., Hadoop).

3.6 Advantages of MPCs

  • Enable solutions to problems that are infeasible on serial machines (climate modelling, protein folding, AI training).
  • High energy efficiency per operation because each core is simple and low‑power.
  • Built‑in fault tolerance through redundancy.
  • Scalable performance – adding more PEs can increase throughput almost linearly, provided the algorithm scales.

3.7 Challenges & limitations

  • Programming complexity – algorithms must expose sufficient parallelism and minimise synchronisation.
  • Communication latency – data movement across the interconnect can dominate execution time.
  • Load balancing – uneven work distribution leads to idle PEs.
  • Memory bandwidth – contention for caches or shared links can become a bottleneck.
  • Cost – large numbers of cores and sophisticated networks increase capital expense.

3.8 Real‑world examples of MPCs

SystemPE countPrimary useInternal architecture
IBM Blue Gene/Q 1 048 576 cores Scientific simulations (e.g., astrophysics) MIMD, 5‑D torus interconnect
NVIDIA Tesla V100 GPU 5 120 CUDA cores Deep learning, high‑performance computing SIMD within SMs, MIMD across SMs
Google TPU v4 pod 4 096 specialised matrix cores per pod TensorFlow workloads, AI training Matrix‑multiply specialised units (SIMD‑like)

3.9 Suggested diagram

2‑D mesh interconnection network – each node represents a PE with local memory and routing links to its four neighbours.

3.10 Link to the Cambridge syllabus (AS & A‑Level)

  • AS 7 – Parallel processing: definition of MPCs, SIMD vs MIMD, interconnection networks, distributed memory.
  • AS 8 – Performance analysis: speed‑up, efficiency, Amdahl’s law (implicit in the efficiency discussion).
  • A‑Level 13 – Parallel algorithms: need to identify fine‑grained parallelism and discuss load‑balancing strategies.
  • A‑Level 14 – Programming for parallel systems: MPI, OpenMP, CUDA/OpenCL, MapReduce.
  • A‑Level 15 – Architecture of modern computers: relationship between RISC/CISC, pipelining and the design of low‑power PEs used in MPCs.

4 Virtual machines (context for “virtualisation” in the syllabus)

4.1 Definition

A virtual machine (VM) is a software abstraction that emulates a complete computer system – CPU, memory, storage and I/O – allowing several isolated “machines” to run on a single physical host.

4.2 Types of VMs

  • System VMs (hypervisors) – e.g., VMware ESXi, Microsoft Hyper‑V, KVM. Provide a full hardware abstraction; each VM runs its own guest OS.
  • Process VMs – e.g., Java Virtual Machine (JVM), .NET CLR. Execute a single program written in a high‑level language and give platform independence.
  • Container‑style VMs – e.g., Docker. Share the host OS kernel but isolate processes, filesystems and network stacks.

4.3 Benefits

  • Portability – “write once, run anywhere” (especially for the JVM).
  • Isolation – faults or security breaches in one VM do not affect others.
  • Improved resource utilisation – multiple VMs share a single physical server.

4.4 Limitations

  • Performance overhead (instruction translation, context switches).
  • Increased management complexity and licensing costs.
  • Potential security risk if the hypervisor is compromised.

4.5 Illustrative diagram – JVM runtime stack

Typical JVM stack: each method call creates a stack frame containing local variables, an operand stack and a reference to the constant pool.

5 Boolean algebra and logic circuits (foundation for hardware design)

5.1 Fundamental concepts

  • Boolean variables – 0 (false) or 1 (true).
  • Basic operators: AND (·), OR (+), NOT (‾).
  • Derived operators: NAND, NOR, XOR, XNOR.

5.2 Truth tables (example)

ABA AND BA OR BNOT A
00001
01011
10010
11110

5.3 Logic gates

GateSymbolBoolean expressionTruth table (A B → Y)
AND&A·B00→0, 01→0, 10→0, 11→1
OR≥1A+B00→0, 01→1, 10→1, 11→1
NOT‾A0→1, 1→0
NAND‾(A·B)00→1, 01→1, 10→1, 11→0
NOR‾(A+B)00→1, 01→0, 10→0, 11→0
XORA⊕B00→0, 01→1, 10→1, 11→0

6 Summary

  • Massively parallel computers consist of huge numbers of simple, low‑power PEs linked by a scalable network and equipped with local memory.
  • Key performance concepts – speed‑up, efficiency and scalability – allow students to analyse whether an algorithm is suitable for an MPC (AO2).
  • Understanding SIMD, MIMD and hybrid designs, together with the main programming models (MPI, OpenMP, CUDA/OpenCL, MapReduce), satisfies the AO1 requirement for “characteristics of massively parallel computers”.
  • Virtual machines and Boolean logic provide essential background for the broader syllabus topics on hardware architecture and software abstraction.

Create an account or Login to take a Quiz

91 views
0 improvement suggestions

Log in to suggest improvements to this note.