Understand cores, cache and clock in a CPU

Computer Architecture – Cores, Cache and Clock

Learning Objectives (AO1 & AO2)

By the end of this lesson you will be able to:

  • Define a CPU core and explain why modern processors contain several cores.
  • Describe the three levels of CPU cache (L1, L2, L3) and their place in the memory hierarchy.
  • State how clock speed is measured, what the unit means and how it influences execution time.
  • Use the performance relationship Performance ∝ Clock Rate / CPI in simple calculations.
  • Apply Amdahl’s law to estimate realistic speed‑up when a program is run on multiple cores.
  • Identify common misconceptions about cores, clock speed and cache.
  • Link these hardware concepts to other mandatory IGCSE topics (data representation, transmission, software, networks, databases and security).

1. Quick Syllabus Map (Cambridge IGCSE 0478)

Topic Key Sub‑topics Relevance to This Lesson
1.1 Data Representation Binary, hexadecimal, two’s‑complement, overflow, logical shifts Registers and cache store binary data; arithmetic inside the ALU uses two’s‑complement and shift operations.
1.2 Data Transmission Packets, packet‑switching, USB, error‑detection (parity, checksum), basic encryption CPU communicates with RAM, I/O devices and the network via buses and packet‑based protocols.
2 Computer Systems – Hardware CPU, registers, ALU, memory hierarchy, I/O, power/thermal design Core of today’s lesson – cores, cache, clock, plus brief links to RAM, secondary storage and power.
3 Computer Systems – Software Operating system, instruction set, micro‑code, compilation Instruction set determines which operations each core can execute; OS schedules threads on cores.
4 Networks & the Internet Network hardware, protocols, packet switching Shows how the CPU’s output eventually becomes network packets.
5 Data Management Databases, file handling, data retrieval Data read from storage passes through RAM and cache before reaching the CPU.
6 Cyber‑security & Emerging Tech Encryption, AI, quantum computing, low‑power cores Modern CPUs include hardware‑accelerated encryption and power‑saving features.

2. Fundamental Data Representation

  • Binary (base‑2) – the language of all digital circuits. Each bit is a 0 or 1.
  • Hexadecimal (base‑16) – convenient shorthand for binary (4 bits = 1 hex digit).
  • Two’s‑complement – method for representing signed integers; allows a single addition circuit to perform subtraction.
  • Overflow – occurs when the result of an arithmetic operation exceeds the range that can be stored in the given number of bits; the CPU sets an overflow flag.
  • Logical shifts – left‑shift multiplies by 2, right‑shift divides by 2 (ignoring sign). Frequently used in low‑level optimisation and in cache‑address calculations.

3. Data Transmission Essentials

  • Packet – a self‑contained unit of data with a header (addressing, length, error‑check) and a payload (the actual data).
  • Packet‑switching vs circuit‑switching – modern networks use packet‑switching; each packet may take a different route.
  • USB (Universal Serial Bus) – a common I/O standard that transfers data in packets over a serial bus.
  • Error‑detection – parity bits, checksums and CRCs allow the receiver to detect corrupted packets.
  • Encryption (basic) – symmetric (same key for encrypt/decrypt) and asymmetric (public/private key) concepts; many CPUs include hardware AES instructions to speed up encryption.

4. CPU Cores – Parallel Execution Units

A core is an independent processing unit capable of fetching, decoding and executing its own instruction stream.

  • Registers & execution units – each core has its own set of registers, an arithmetic‑logic unit (ALU) and usually a private L1 cache.
  • Throughput gain – multiple cores can run several threads simultaneously, improving multitasking and multi‑threaded applications.
  • Non‑linear scaling – performance does not simply double when the core count doubles; the software must be written to exploit the extra cores (see Amdahl’s law).

5. CPU Cache – The Memory Hierarchy

Cache is a small, very fast memory placed close to the core. It stores copies of data and instructions that are likely to be needed again, reducing the time spent waiting for main memory (RAM).

  • L1 cache – smallest and fastest; often split into an instruction cache (I‑cache) and a data cache (D‑cache).
  • L2 cache – larger and slightly slower; can be private to a core or shared by a pair of cores.
  • L3 cache – biggest on‑die cache; typically shared by all cores on the die.
  • Inclusiveness – most modern CPUs keep the same line in L1, L2 and L3 (inclusive cache), simplifying coherence.

6. Clock Speed (Frequency)

The internal clock generates a regular series of pulses. Each pulse allows the CPU to perform a basic operation (fetch, decode, execute, write‑back).

  • Measured in hertz (Hz); modern CPUs are expressed in gigahertz (GHz) where 1 GHz = 10⁹ cycles s⁻¹.
  • Higher clock speed → more cycles per second → potentially more instructions per second, **provided** the CPI (cycles per instruction) does not increase.
  • Clock speed is limited by power consumption and heat dissipation; designers balance frequency with core count, cache size and voltage.

7. Relating Cores, Cache, Clock and Performance

A simplified performance model used in the syllabus is:

Performance = Instructions / Execution Time = Instructions / (CPI × (1 / Clock Rate))

Re‑arranged:

Performance ∝ Clock Rate / CPI

7.1 What Influences CPI?

  • Cache efficiency – a cache hit keeps CPI low; a miss forces a RAM access, dramatically raising CPI.
  • Pipeline depth – deeper pipelines increase instruction throughput but may raise CPI when stalls occur.
  • Branch prediction & speculative execution – accurate prediction reduces pipeline flushes, keeping CPI down.
  • Instruction‑set complexity – CISC instructions can require more cycles than simple RISC operations.

7.2 Parallelism and Amdahl’s Law

Even with many cores, the overall speed‑up is limited by the portion of a program that must run serially.

Speed‑up = 1 / [(1 − P) + P / N]

  • P = fraction of the program that can be parallelised.
  • N = number of cores.

Example: 80 % of a task can be parallelised (P = 0.8) and 4 cores are used.
Speed‑up = 1 / [(0.2) + (0.8/4)] ≈ 2.22, not 4.

7.3 Common Misconceptions

  • More cores = linear speed‑up – only true for perfectly parallel code; real programs suffer from overhead and serial sections.
  • Higher clock = always faster – a CPU with a lower clock but a much lower CPI (thanks to larger cache or a more efficient pipeline) can outperform a higher‑clocked, less efficient design.
  • Cache size alone determines performance – latency, associativity and the coherence protocol are equally important.

8. Example Calculations

8.1 Single‑core, no parallelism

Program: 1 billion instructions
Clock rate: 3 GHz
Average CPI: 1.5

Time = (Instructions × CPI) / Clock Rate = (10⁹ × 1.5) / (3 × 10⁹) = 0.5 s

8.2 Quad‑core, ideal parallelism (P = 1)

Time₍4 cores₎ = 0.5 s / 4 = 0.125 s

8.3 Quad‑core with realistic parallelism (P = 0.8)

Speed‑up = 1 / [(1 − 0.8) + 0.8 / 4] ≈ 2.22
Adjusted time = 0.5 s / 2.22 ≈ 0.225 s

9. Comparison of Typical CPUs

Feature Single‑Core Quad‑Core Octa‑Core
Typical core count 1 4 8
L1 cache per core 32 KB (I + D) 32 KB (I + D) 32 KB (I + D)
Shared L2 cache 256 KB 2 MB (often 2 × 1 MB) 4 MB (often 2 × 2 MB)
Shared L3 cache 2 MB 8 MB 16 MB
Typical clock speed 2.5 GHz 3.0 GHz 3.2 GHz
Ideal use‑case (AO3) Simple, single‑threaded tasks (e.g., word processing) Office suites, light gaming, web browsing Heavy multitasking, video editing, high‑end gaming, 3‑D rendering

10. Links to Other Mandatory Topics (AO3 – Evaluation)

  • Instruction Set & Micro‑code – Determines which operations each core can perform; RISC vs. CISC influences CPI.
  • Virtual Memory & Paging – When a cache miss reaches RAM, the OS may page data in/out, adding latency.
  • Power & Thermal Design – More cores and higher frequencies increase heat; modern CPUs use dynamic frequency scaling (Turbo Boost) and power‑gating.
  • Binary Data Representation – Two’s‑complement arithmetic, overflow detection and logical shifts are performed inside the ALU.
  • Data Transmission – CPU output is packaged into packets for USB, Ethernet or Wi‑Fi; error‑detection ensures reliability.
  • Databases & File Handling – Queries retrieve data from storage; the data passes through RAM and cache before the CPU processes it.
  • Encryption & Emerging Technologies – Hardware AES/NIST‑approved algorithms accelerate secure communication; low‑power “big‑LITTLE” cores are used in mobile devices.

11. Summary Checklist (AO1)

  1. Identify the number of cores in a CPU and explain parallel execution.
  2. Describe the three cache levels, typical sizes and why they sit close to the core.
  3. State how clock speed is measured (Hz, GHz) and its effect on cycles per second.
  4. Use the performance formula Performance ∝ Clock Rate / CPI and name factors that affect CPI.
  5. Apply Amdahl’s law to estimate realistic speed‑up with multiple cores.
  6. Recognise common misconceptions about cores, clock speed and cache.
  7. Connect the hardware concepts to related syllabus areas (data representation, transmission, software, networks, databases, security, power).
Suggested diagram: Block diagram of a CPU die showing multiple cores, each with its own L1 cache, shared L2/L3 caches, a clock generator, and arrows indicating the memory hierarchy (register → L1 → L2 → L3 → RAM → secondary storage). Include a simple bus linking the CPU to I/O devices (USB, network card) and a box representing the operating system’s scheduler.

Create an account or Login to take a Quiz

45 views
0 improvement suggestions

Log in to suggest improvements to this note.