Why many programmes start with files but soon need a DBMS
| Limitation | Typical Impact on an Application |
|---|---|
| Data redundancy & inconsistency | Same information stored in several files. Updating one copy does not automatically update the others → contradictory records. |
| Limited data access | Files are usually accessed sequentially. Finding a particular record often requires scanning the whole file (≈ O(n) reads). |
| Lack of data independence | Any change to the file layout (e.g., adding a field) forces a change in every programme that reads or writes the file. |
| Weak integrity control | No built‑in primary‑key, foreign‑key or domain constraints. The programmer must write extra code to prevent illegal data. |
| Concurrency problems | When several users edit the same file at the same time, updates can be lost or the file can become corrupted; file‑system locks are coarse‑grained. |
| Coarse security & access control | Permissions are applied to whole files only – you cannot hide individual records from a particular user. |
| Poor scalability & performance | Searches become slower as the file grows; backup, restore and recovery are labour‑intensive. |
| No standard query language | Every query requires custom code; there is no declarative language such as SQL to retrieve, join or aggregate data. |
For a file containing n records:
students.txt – personal details (ID, name, address, DOB)
enrolments.txt – course enrolments (StudentID, CourseCode, Semester)
If a student moves house, the address must be edited in students.txt. Because the address is also duplicated in enrolments.txt (for reporting purposes), forgetting to update the second file creates an inconsistency that can cause wrong mailing lists or inaccurate fee calculations.
| Aspect | File‑Based Approach | Database Management System (DBMS) |
|---|---|---|
| Redundancy | High – duplicate records in many files | Low – data stored once, referenced by keys |
| Access speed | Linear scan (O(n)) | Indexed access (O(log n)) + query optimiser |
| Data integrity | Manual checks in code | Built‑in constraints (PK, FK, CHECK, NOT NULL) |
| Concurrency | Coarse file locks; prone to lost updates | Transaction management (ACID), row‑level locking |
| Scalability | Poor – performance degrades quickly | Good – handles large volumes & many simultaneous users |
| Security | File‑level permissions only | Granular privileges (user, role, column) |
| Query capability | Custom code per query | SQL – powerful, declarative, portable |
| Backup & recovery | Manual copy of each file | Automated transaction logs, point‑in‑time restore |
Figure 1 – Left: several independent files (students.txt, enrolments.txt, grades.txt) accessed directly by three programmes. Right: a single logical database managed by a DBMS; the same three programmes send SQL statements to the DBMS layer, which handles storage, indexing and concurrency.
| Prefix | Symbol | Decimal factor | Binary factor (IEC) |
|---|---|---|---|
| kilo | k | 10³ = 1 000 | kibi (Ki) = 2¹⁰ = 1 024 |
| mega | M | 10⁶ = 1 000 000 | mebi (Mi) = 2²⁰ = 1 048 576 |
| giga | G | 10⁹ = 1 000 000 000 | gibi (Gi) = 2³⁰ = 1 073 741 824 |
| Aspect | Bitmap (Raster) | Vector |
|---|---|---|
| Storage model | Pixel‑by‑pixel array (width × height × bits‑per‑pixel) | Mathematical primitives (lines, curves, shapes) |
| Scalability | Poor – enlargement causes pixelation | Excellent – resolution‑independent |
| Typical formats | BMP, PNG, JPEG, GIF | SVG, EPS, PDF |
| Typical use‑cases | Photographs, detailed images | Logos, icons, CAD drawings |
Calculate the size of an 800 × 600 colour image stored as a 24‑bit bitmap (no compression).
Width × Height × Bits‑per‑pixel = 800 × 600 × 24 = 11 520 000 bits
Bytes = 11 520 000 ÷ 8 = 1 440 000 bytes
KiB = 1 440 000 ÷ 1 024 ≈ 1 406 KiB
MiB = 1 406 ÷ 1 024 ≈ 1.37 MiB
Exercise: Given the character frequencies a:40, b:30, c:20, d:10, draw the Huffman tree and compute the average bits per character.
IPv4 example: 192.168.10.45/24
/24 = 255.255.255.0.192.168.10.0.192.168.10.255.192.168.10.1 – 192.168.10.254.IPv6 – 128‑bit addresses written in hexadecimal groups, e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334. No need for NAT; subnetting is expressed by a prefix length (commonly /64).
| Characteristic | Client‑Server | Peer‑to‑Peer |
|---|---|---|
| Control | Centralised servers | Each node can act as both client and server |
| Scalability | Add more servers or upgrade hardware | Scales naturally as more peers join |
| Typical examples | Web browsing, email, online banking | BitTorrent, early Skype, file‑sharing networks |
| TCP/IP Layer | Corresponding OSI Layer(s) | Typical Protocols |
|---|---|---|
| Application | Application, Presentation, Session | HTTP, HTTPS, SMTP, FTP, DNS |
| Transport | Transport | TCP, UDP |
| Internet | Network | IP, ICMP, ARP |
| Link | Data Link, Physical | Ethernet (IEEE 802.3), Wi‑Fi (IEEE 802.11) |
Scenario: A school wants to host its Learning Management System (LMS).
Discussion points for students: cost, maintenance responsibility, data ownership, scalability, and security implications.
A pipeline overlaps the stages of the fetch‑decode‑execute cycle so that multiple instructions are processed simultaneously.
| Instruction | Without Pipeline | With 5‑stage Pipeline (F‑D‑E‑M‑W) |
|---|---|---|
| 1 | F‑D‑E‑M‑W | F |
| 2 | — — — — — | D |
| 3 | — — — — — | E |
| 4 | — — — — — | M |
| 5 | — — — — — | W |
After the pipeline is filled, one instruction completes each clock cycle (ideal CPI = 1), dramatically increasing throughput.
| Gate | Symbol | Truth Table |
|---|---|---|
| AND | & | A B | Y 0 0 → 0 0 1 → 0 1 0 → 0 1 1 → 1 |
| OR | ∨ | A B | Y 0 0 → 0 0 1 → 1 1 0 → 1 1 1 → 1 |
| NOT | ¬ | A | Y 0 → 1 1 → 0 |
| NAND | ↑ | Inverse of AND |
| NOR | ↓ | Inverse of OR |
| XOR | ⊕ | Y = 1 when A ≠ B |
Students should be able to combine gates to implement Boolean expressions, simplify using De Morgan’s laws, and draw corresponding circuit diagrams.
File‑based storage is simple to implement but suffers from redundancy, poor access speed, weak integrity, limited concurrency, coarse security and a lack of powerful query capabilities. These limitations motivate the use of a Database Management System, which provides data independence, indexing, transaction control, granular security and a standard language (SQL) for data manipulation. Understanding these trade‑offs is essential for designing robust, scalable information systems that meet the requirements of the Cambridge IGCSE/A‑Level Computer Science syllabus.
Your generous donation helps us continue providing free Cambridge IGCSE & A-Level resources, past papers, syllabus notes, revision questions, and high-quality online tutoring to students across Kenya.