Cambridge Notes, Past Papers, Revision Questions

Cambridge IGCSE/A‑Level Computer Science – Core Concepts

8.1 Database Concepts – Limitations of a File‑Based Approach

Why many programmes start with files but soon need a DBMS

Each application creates its own files – simple for tiny, single‑user programmes.

As the amount of data, the number of programmes and the number of users grow, the drawbacks become serious.

Key Limitations

Limitation	Typical Impact on an Application
Data redundancy & inconsistency	Same information stored in several files. Updating one copy does not automatically update the others → contradictory records.
Limited data access	Files are usually accessed sequentially. Finding a particular record often requires scanning the whole file (≈ O(n) reads).
Lack of data independence	Any change to the file layout (e.g., adding a field) forces a change in every programme that reads or writes the file.
Weak integrity control	No built‑in primary‑key, foreign‑key or domain constraints. The programmer must write extra code to prevent illegal data.
Concurrency problems	When several users edit the same file at the same time, updates can be lost or the file can become corrupted; file‑system locks are coarse‑grained.
Coarse security & access control	Permissions are applied to whole files only – you cannot hide individual records from a particular user.
Poor scalability & performance	Searches become slower as the file grows; backup, restore and recovery are labour‑intensive.
No standard query language	Every query requires custom code; there is no declarative language such as SQL to retrieve, join or aggregate data.

Mathematical View of Search Cost

For a file containing n records:

Sequential search (no index): average reads = (n + 1) / 2 → O(n).

With a balanced index (e.g., B‑tree of branching factor b): average reads ≈ log_b n → O(log n).

Illustrative Example – Student Management System


students.txt      – personal details (ID, name, address, DOB)
enrolments.txt    – course enrolments (StudentID, CourseCode, Semester)

If a student moves house, the address must be edited in students.txt. Because the address is also duplicated in enrolments.txt (for reporting purposes), forgetting to update the second file creates an inconsistency that can cause wrong mailing lists or inaccurate fee calculations.

Real‑World Consequences

Inconsistent reports – financial statements derived from duplicated transaction files may not match.

Data loss – corruption of a single file can erase all copies of a particular data item.

High maintenance cost – any change to the file format forces a rewrite of every dependent programme.

Limited decision support – without a query language, ad‑hoc analysis is slow and error‑prone.

Comparison with a DBMS

Aspect	File‑Based Approach	Database Management System (DBMS)
Redundancy	High – duplicate records in many files	Low – data stored once, referenced by keys
Access speed	Linear scan (O(n))	Indexed access (O(log n)) + query optimiser
Data integrity	Manual checks in code	Built‑in constraints (PK, FK, CHECK, NOT NULL)
Concurrency	Coarse file locks; prone to lost updates	Transaction management (ACID), row‑level locking
Scalability	Poor – performance degrades quickly	Good – handles large volumes & many simultaneous users
Security	File‑level permissions only	Granular privileges (user, role, column)
Query capability	Custom code per query	SQL – powerful, declarative, portable
Backup & recovery	Manual copy of each file	Automated transaction logs, point‑in‑time restore

Suggested Classroom Diagram (description)

Figure 1 – Left: several independent files (students.txt, enrolments.txt, grades.txt) accessed directly by three programmes. Right: a single logical database managed by a DBMS; the same three programmes send SQL statements to the DBMS layer, which handles storage, indexing and concurrency.

1 Information Representation

1.1 Binary, Hexadecimal & Binary Prefixes

Prefix	Symbol	Decimal factor	Binary factor (IEC)
kilo	k	10³ = 1 000	kibi (Ki) = 2¹⁰ = 1 024
mega	M	10⁶ = 1 000 000	mebi (Mi) = 2²⁰ = 1 048 576
giga	G	10⁹ = 1 000 000 000	gibi (Gi) = 2³⁰ = 1 073 741 824

1.2 Text Representation

ASCII – 7 bits per character (128 symbols). Suitable for basic English text.

Unicode (UTF‑8) – variable length (1–4 bytes) covering > 1 million characters; backward compatible with ASCII.

1.3 Multimedia – Bitmap vs. Vector

Aspect	Bitmap (Raster)	Vector
Storage model	Pixel‑by‑pixel array (width × height × bits‑per‑pixel)	Mathematical primitives (lines, curves, shapes)
Scalability	Poor – enlargement causes pixelation	Excellent – resolution‑independent
Typical formats	BMP, PNG, JPEG, GIF	SVG, EPS, PDF
Typical use‑cases	Photographs, detailed images	Logos, icons, CAD drawings

Worked Example – Bitmap Size

Calculate the size of an 800 × 600 colour image stored as a 24‑bit bitmap (no compression).


Width × Height × Bits‑per‑pixel = 800 × 600 × 24 = 11 520 000 bits
Bytes = 11 520 000 ÷ 8 = 1 440 000 bytes
KiB = 1 440 000 ÷ 1 024 ≈ 1 406 KiB
MiB = 1 406 ÷ 1 024 ≈ 1.37 MiB

1.4 Compression Techniques

Run‑Length Encoding (RLE) – lossless; replaces consecutive identical symbols with a count.

Huffman coding – lossless; variable‑length codes based on symbol frequencies (used in ZIP, PNG).

JPEG – lossy image compression; uses Discrete Cosine Transform and quantisation.

MP3 / AAC – lossy audio compression; exploits psychoacoustic masking.

Exercise: Given the character frequencies a:40, b:30, c:20, d:10, draw the Huffman tree and compute the average bits per character.

2 Communication & Networks

2.1 Network Types & Topologies

LAN – Local Area Network (e.g., Ethernet, Wi‑Fi) covering a single building or campus.

WAN – Wide Area Network (e.g., the Internet) spanning cities or continents.

Topologies – star, bus, ring, mesh, hybrid; affect cabling, fault tolerance and performance.

2.2 IP Addressing & Subnetting

IPv4 example: 192.168.10.45/24

Subnet mask for /24 = 255.255.255.0.

Network address = 192.168.10.0.

Broadcast address = 192.168.10.255.

Usable host range = 192.168.10.1 – 192.168.10.254.

IPv6 – 128‑bit addresses written in hexadecimal groups, e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334. No need for NAT; subnetting is expressed by a prefix length (commonly /64).

2.3 Client‑Server vs. Peer‑to‑Peer (P2P)

Characteristic	Client‑Server	Peer‑to‑Peer
Control	Centralised servers	Each node can act as both client and server
Scalability	Add more servers or upgrade hardware	Scales naturally as more peers join
Typical examples	Web browsing, email, online banking	BitTorrent, early Skype, file‑sharing networks

2.4 OSI vs. TCP/IP Model

TCP/IP Layer	Corresponding OSI Layer(s)	Typical Protocols
Application	Application, Presentation, Session	HTTP, HTTPS, SMTP, FTP, DNS
Transport	Transport	TCP, UDP
Internet	Network	IP, ICMP, ARP
Link	Data Link, Physical	Ethernet (IEEE 802.3), Wi‑Fi (IEEE 802.11)

2.5 Cloud Computing – Short Case Study

Scenario: A school wants to host its Learning Management System (LMS).

SaaS (Software‑as‑a‑Service) – Use a ready‑made LMS such as Google Classroom. The provider manages hardware, OS, middleware and the application. The school pays a subscription and focuses on content creation.

IaaS (Infrastructure‑as‑a‑Service) – Rent virtual machines from AWS, Azure or Google Cloud, install a custom LMS (e.g., Moodle). The school controls the OS and software but does not maintain physical servers.

Discussion points for students: cost, maintenance responsibility, data ownership, scalability, and security implications.

3 Hardware & Logic

3.1 CPU Architecture & the Fetch‑Decode‑Execute Cycle

Fetch – Read the next instruction from memory address held in the Program Counter (PC).

Decode – Control Unit interprets the opcode and determines required operands.

Execute – Arithmetic Logic Unit (ALU) performs the operation; registers are read/written.

Store result – Write the outcome back to a register or memory location.

Update PC (increment or branch) and repeat.

3.2 Pipelining

A pipeline overlaps the stages of the fetch‑decode‑execute cycle so that multiple instructions are processed simultaneously.

Instruction	Without Pipeline	With 5‑stage Pipeline (F‑D‑E‑M‑W)
1	F‑D‑E‑M‑W	F
2	— — — — —	D
3	— — — — —	E
4	— — — — —	M
5	— — — — —	W

After the pipeline is filled, one instruction completes each clock cycle (ideal CPI = 1), dramatically increasing throughput.

3.3 Parallel Processing Classification (Flynn’s Taxonomy)

SISD – Single Instruction, Single Data (classic sequential processor).

SIMD – Single Instruction, Multiple Data (e.g., graphics processors, vector units).

MISD – Multiple Instruction, Single Data (rare, used in some fault‑tolerant systems).

MIMD – Multiple Instruction, Multiple Data (multi‑core CPUs, clusters).

3.4 Logic Gates & Boolean Algebra

Gate	Symbol	Truth Table
AND	&	A B \| Y 0 0 → 0 0 1 → 0 1 0 → 0 1 1 → 1
OR	∨	A B \| Y 0 0 → 0 0 1 → 1 1 0 → 1 1 1 → 1
NOT	¬	A \| Y 0 → 1 1 → 0
NAND	↑	Inverse of AND
NOR	↓	Inverse of OR
XOR	⊕	Y = 1 when A ≠ B

Students should be able to combine gates to implement Boolean expressions, simplify using De Morgan’s laws, and draw corresponding circuit diagrams.

4 Processor Fundamentals (Optional Extension)

4.1 Interrupts & Exception Handling

An interrupt signals the CPU to suspend the current programme, save its state, and execute an interrupt‑service routine (ISR).

Types: hardware (e.g., I/O ready) and software (e.g., divide‑by‑zero).

Ensures responsive I/O and enables multitasking.

4.2 Memory Hierarchy

Registers – fastest, smallest.

Cache (L1, L2, L3) – stores recently used data/instructions.

Main memory (RAM) – larger but slower.

Secondary storage (HDD/SSD) – persistent, much slower.

4.3 Virtual Memory & Paging

Logical address space is mapped onto physical memory using page tables.

Allows programmes to use more memory than physically available and provides isolation between processes.

Summary

File‑based storage is simple to implement but suffers from redundancy, poor access speed, weak integrity, limited concurrency, coarse security and a lack of powerful query capabilities. These limitations motivate the use of a Database Management System, which provides data independence, indexing, transaction control, granular security and a standard language (SQL) for data manipulation. Understanding these trade‑offs is essential for designing robust, scalable information systems that meet the requirements of the Cambridge IGCSE/A‑Level Computer Science syllabus.

Show understanding of the limitations of using a file-based approach for the storage and retrieval of data

Cambridge IGCSE/A‑Level Computer Science – Core Concepts

8.1 Database Concepts – Limitations of a File‑Based Approach

Key Limitations

Mathematical View of Search Cost

Illustrative Example – Student Management System

Real‑World Consequences

Comparison with a DBMS

Suggested Classroom Diagram (description)

1 Information Representation

1.1 Binary, Hexadecimal & Binary Prefixes

1.2 Text Representation

1.3 Multimedia – Bitmap vs. Vector

Worked Example – Bitmap Size

1.4 Compression Techniques

2 Communication & Networks

2.1 Network Types & Topologies

2.2 IP Addressing & Subnetting

2.3 Client‑Server vs. Peer‑to‑Peer (P2P)

2.4 OSI vs. TCP/IP Model

2.5 Cloud Computing – Short Case Study

3 Hardware & Logic

3.1 CPU Architecture & the Fetch‑Decode‑Execute Cycle

3.2 Pipelining

3.3 Parallel Processing Classification (Flynn’s Taxonomy)

3.4 Logic Gates & Boolean Algebra

4 Processor Fundamentals (Optional Extension)

4.1 Interrupts & Exception Handling

4.2 Memory Hierarchy

4.3 Virtual Memory & Paging