Cambridge Notes, Past Papers, Revision Questions

8.1 Database Concepts

Objective

Explain why file‑based systems are inadequate, describe the key features of a relational database that overcome those limitations, and demonstrate understanding of ER modelling, normalisation (1NF‑3NF), and basic SQL (DDL & DML).

1. What is a Relational Database?

A relational database stores data in tables (relations). Each table consists of rows (records) and columns (attributes).

Every table has a primary key that uniquely identifies each row.

Relationships between tables are expressed with foreign keys that reference primary keys in other tables.

The database is managed by a DBMS that provides data independence, integrity, security, concurrency control and a declarative query language (SQL).

2. Limitations of a File‑Based System

File‑based applications store data in flat files that programmes read and write directly. The Cambridge syllabus highlights the following drawbacks.

Limitation	Explanation
Data redundancy & inconsistency	The same fact (e.g. a student’s address) appears in several files; updating one copy may leave others unchanged.
Poor data integrity	No built‑in checks for uniqueness, referential integrity or domain rules.
Lack of data independence	Any change to file layout (add a field, change format) forces every programme that uses the file to be rewritten.
Limited query capability	Procedural code is required to locate, sort and combine data; ad‑hoc queries are difficult.
Concurrency problems	Simultaneous access can cause lost updates, overwriting or corrupted files.
Coarse‑grained security	Only OS‑level file permissions are available; fine‑grained user/role control is missing.

Case‑study: Student‑record system (file‑based)

File	Typical fields
Students.txt	StudentID, Name, DOB, Address, Course1, Course2, …
Courses.txt	CourseID, Title, Credits, Lecturer, EnrolledStudentIDs

Changing a student’s address requires editing Students.txt and every occurrence in Courses.txt.

Adding a new field (e.g. “Email”) means editing both files and recompiling every programme.

Finding all students in “Computer Science” needs custom code that reads Courses.txt, extracts IDs, then scans Students.txt for each ID.

Two clerks editing the same file simultaneously can overwrite each other’s changes.

Only the OS user who owns the file can read/write it; there is no way to give a clerk read‑only access.

3. Relational Database Features that Overcome These Limitations

3.1 Data Independence

Logical data independence – the logical schema (tables, columns, relationships) can be altered without changing application code. Example: ALTER TABLE Student ADD Email VARCHAR(100); does not require program modification.

Physical data independence – storage details (indexes, clustering, partitioning) can be changed without affecting programmes.

Views provide a stable virtual table that hides physical implementation details.

3.2 Reduced Redundancy & Normalisation

Data is stored once and related through keys. Normalisation removes duplication systematically.

Normal Forms – Quick Reference

Normal Form	Key Requirement
1NF	Eliminate repeating groups; each field contains atomic values.
2NF	Be in 1NF and have no partial dependency of non‑key attributes on a part of a composite primary key.
3NF	Be in 2NF and have no transitive dependency of non‑key attributes on other non‑key attributes.

Step‑by‑step normalisation (Student‑Course example)

-- Unnormalised file (single table)
StudentID | Name  | DOB        | Course1 | Course2 | Course3
---------------------------------------------------------
1        | Alice | 2005‑04‑12 | CS101   | MA101   | NULL
2        | Bob   | 2004‑09‑30 | CS101   | PH101   | MA101

1NF – eliminate repeating groups

Create separate Student and Enrolment tables.

Student
StudentID | Name  | DOB
-------------------------
1        | Alice | 2005‑04‑12
2        | Bob   | 2004‑09‑30
Enrolment
StudentID | CourseCode
-----------------------
1        | CS101
1        | MA101
2        | CS101
2        | PH101
2        | MA101

2NF – remove partial dependency
The primary key of Enrolment is (StudentID, CourseCode); there are no non‑key attributes, so 2NF is satisfied.

3NF – remove transitive dependency

Create a Course table for attributes that depend only on CourseCode.

Course
CourseCode | Title          | Credits
------------------------------------
CS101      | Computer Sci   | 3
MA101      | Mathematics    | 3
PH101      | Physics        | 3

Result: three tables in 3NF, no duplicated student or course information.

3.3 Integrity Constraints

Built‑in rules that guarantee data quality.

Domain constraints – data type, length, range.

 DOB DATE,Credits INT CHECK (Credits BETWEEN 1 AND 6)

Primary‑key (PK) – unique identifier for each row.
```
StudentID INT PRIMARY KEY
```

Foreign‑key (FK) – enforces referential integrity.

FOREIGN KEY (StudentID) REFERENCES Student(StudentID)

UNIQUE – prevents duplicate values in a column.
```
Email VARCHAR(100) UNIQUE
```

NOT NULL – forces a column to contain a value.
```
Name VARCHAR(50) NOT NULL
```

CHECK – custom condition.
```
CHECK (Credits BETWEEN 1 AND 6)
```

3.4 SQL – DDL and DML

SQL is a declarative language for defining structures (DDL) and manipulating data (DML).

DDL – creating the tables

CREATE TABLE Student (
StudentID INT PRIMARY KEY,
Name      VARCHAR(50) NOT NULL,
DOB       DATE,
Email     VARCHAR(100) UNIQUE
);
CREATE TABLE Course (
CourseCode CHAR(5) PRIMARY KEY,
Title      VARCHAR(100) NOT NULL,
Credits    INT CHECK (Credits BETWEEN 1 AND 6)
);
CREATE TABLE Enrolment (
StudentID  INT,
CourseCode CHAR(5),
PRIMARY KEY (StudentID, CourseCode),
FOREIGN KEY (StudentID)  REFERENCES Student(StudentID),
FOREIGN KEY (CourseCode) REFERENCES Course(CourseCode)
);

DML – basic data manipulation

-- INSERT
INSERT INTO Student VALUES (1, 'Alice', '2005-04-12', 'alice@example.com');
INSERT INTO Course   VALUES ('CS101', 'Computer Science', 3);
INSERT INTO Enrolment VALUES (1, 'CS101');
-- UPDATE
UPDATE Student SET Email = 'alice.new@example.com' WHERE StudentID = 1;
-- DELETE
DELETE FROM Enrolment WHERE StudentID = 1 AND CourseCode = 'CS101';

SELECT – most frequently examined clauses (one‑line purpose)

SELECT – list the columns to return.

FROM – specify the table(s) to read.

WHERE – filter rows.

JOIN (INNER, LEFT, etc.) – combine rows from two tables based on a condition.

GROUP BY – aggregate rows that share a value.

HAVING – filter groups after aggregation.

ORDER BY – sort the result set.

Example query (join, filter, sort)

SELECT s.Name, c.Title
FROM Enrolment e
JOIN Student s ON e.StudentID = s.StudentID
JOIN Course  c ON e.CourseCode = c.CourseCode
WHERE c.Credits >= 3
ORDER BY s.Name;

Aggregate example

SELECT c.Title, COUNT(*) AS NumStudents
FROM Enrolment e
JOIN Course c ON e.CourseCode = c.CourseCode
GROUP BY c.Title
HAVING COUNT(*) > 1;

Mini‑exercise

Write a CREATE TABLE statement for a Library entity that stores:

BookID (primary key, integer)

Title (text, up to 200 characters, not null)

Author (text, up to 100 characters)

PublishedYear (integer, must be between 1500 and the current year)

3.5 Powerful Query Capability

SQL is declarative – the user states *what* data is required, the DBMS decides *how* to obtain it.

The query optimiser automatically chooses the best execution plan (join order, index use, predicate push‑down).

3.6 Concurrency Control & Transaction Management

ACID properties
- Atomicity – a transaction is all‑or‑nothing.
- Consistency – constraints remain satisfied after commit.
- Isolation – concurrent transactions do not interfere; each sees a consistent snapshot.
- Durability – committed changes survive crashes.

Locking protocol (Two‑Phase Locking) – a transaction obtains all required locks before releasing any, preventing dirty reads and lost updates.

Timestamp ordering – each transaction gets a timestamp; the DBMS orders conflicting operations to ensure serialisability.

Illustration of a lost‑update problem (without proper locking):

-- User A reads Alice’s email (alice@example.com) -- User B reads the same value -- A updates to aliceA@example.com and commits -- B updates to aliceB@example.com and commits -- Final value = aliceB@example.com (A’s change lost)

With 2PL, both users would acquire an exclusive lock on the row, forcing one to wait until the other commits.

Example transaction block:

BEGIN TRANSACTION;
UPDATE Student SET Email = 'bob@example.com' WHERE StudentID = 2;
INSERT INTO Enrolment VALUES (2, 'MA101');
COMMIT;

3.7 Security and Access Control

Granular privileges – grant specific rights on tables, columns or views.
```
GRANT SELECT, INSERT ON Student TO clerk1;
```

Roles – a named set of privileges that can be assigned to many users.

CREATE ROLE admin_role;
GRANT ALL PRIVILEGES ON ALL TABLES TO admin_role;GRANT admin_role TO alice;

Views for logical security – expose only required columns/rows.

CREATE VIEW StudentPublic AS
SELECT StudentID, Name, DOB FROM Student;GRANT SELECT ON StudentPublic TO public_user;

3.8 Scalability and Performance Optimisation

Indexes – B‑tree or hash structures that accelerate search (e.g., index on Student.Email).

Query optimiser – rewrites SQL into an efficient execution plan (chooses join order, uses indexes, applies filters early).

Partitioning & clustering – spread large tables across disks or nodes, improving I/O and supporting very large data sets.

Materialised views – store pre‑computed results for frequently used aggregates.

Why scalability matters for the Cambridge exam: large tables are common in exam scenarios; knowing that indexes and partitioning improve performance helps students justify design choices.

4. Mapping Syllabus Features to DBMS Capabilities

Syllabus Feature	DBMS Capability that Demonstrates It
Logical data independence	Views and `ALTER TABLE` statements that do not require programme changes
Physical data independence	Indexes, clustering, partitioning – transparent to applications
Reduced redundancy	Normalisation (1NF‑3NF) and use of primary/foreign keys
Integrity constraints	`PRIMARY KEY`, `FOREIGN KEY`, `UNIQUE`, `NOT NULL`, `CHECK`
SQL query language	`SELECT`, `JOIN`, `GROUP BY`, `HAVING`, sub‑queries – declarative retrieval
Transaction management	`BEGIN…COMMIT`, ACID properties, locking protocols
Security & access control	`GRANT/REVOKE`, roles, column‑level views
Scalability	Indexes, query optimiser, partitioning, distributed DBMS

5. Comparison: File‑Based System vs Relational DBMS

Aspect	File‑Based System	Relational Database
Data redundancy	High – same data duplicated in many files	Low – single‑table storage; normalisation removes duplication
Data integrity	Manual checks only	Built‑in constraints (PK, FK, UNIQUE, NOT NULL, CHECK)
Query capability	Procedural code required	Declarative SQL; joins, aggregates, ad‑hoc queries
Data independence	Low – schema changes force programme rewrites	High – logical & physical independence via views, `ALTER`
Concurrency control	None or simple file locking	ACID transactions; sophisticated locking & timestamp protocols
Security	OS file permissions only	Fine‑grained privileges, roles, column/row‑level views
Scalability	Limited by file size and manual indexing	Indexes, optimiser, partitioning, distributed DBMS

6. Entity‑Relationship (E‑R) Modelling

Entity – a thing of interest (e.g., Student, Course).

Attribute – a property of an entity (e.g., Name, DOB).

Relationship – an association between entities (e.g., Enrols linking Student and Course).

Cardinality – describes how many instances of one entity relate to another (1:1, 1:M, M:N).

Suggested ER diagram (Student‑Course‑Enrolment):

ER diagram showing Student (PK StudentID), Course (PK CourseCode) and Enrolment (PK StudentID+CourseCode) with relationships

7. Summary Checklist for Exam Revision

Know the six main limitations of file‑based systems.

Define a relational database, primary key, foreign key, and view.

Recall the key rule for 1NF, 2NF and 3NF (see the normal‑form table).

Be able to write basic DDL (CREATE TABLE) and DML (INSERT, UPDATE, DELETE, SELECT) statements.

Identify and give an example of each integrity constraint.

Explain ACID properties and give a simple example of a transaction.

State how security is implemented (GRANT, REVOKE, roles, views).

Describe why indexes, partitioning and the optimiser improve scalability.

Describe the features of a relational database that address the limitations of a file-based approach

8.1 Database Concepts