Describe the features of a relational database that address the limitations of a file-based approach

8.1 Database Concepts

Objective

Explain why file‑based systems are inadequate, describe the key features of a relational database that overcome those limitations, and demonstrate understanding of ER modelling, normalisation (1NF‑3NF), and basic SQL (DDL & DML).


1. What is a Relational Database?

  • A relational database stores data in tables (relations). Each table consists of rows (records) and columns (attributes).
  • Every table has a primary key that uniquely identifies each row.
  • Relationships between tables are expressed with foreign keys that reference primary keys in other tables.
  • The database is managed by a DBMS that provides data independence, integrity, security, concurrency control and a declarative query language (SQL).


2. Limitations of a File‑Based System

File‑based applications store data in flat files that programmes read and write directly. The Cambridge syllabus highlights the following drawbacks.

LimitationExplanation
Data redundancy & inconsistencyThe same fact (e.g. a student’s address) appears in several files; updating one copy may leave others unchanged.
Poor data integrityNo built‑in checks for uniqueness, referential integrity or domain rules.
Lack of data independenceAny change to file layout (add a field, change format) forces every programme that uses the file to be rewritten.
Limited query capabilityProcedural code is required to locate, sort and combine data; ad‑hoc queries are difficult.
Concurrency problemsSimultaneous access can cause lost updates, overwriting or corrupted files.
Coarse‑grained securityOnly OS‑level file permissions are available; fine‑grained user/role control is missing.

Case‑study: Student‑record system (file‑based)

FileTypical fields
Students.txtStudentID, Name, DOB, Address, Course1, Course2, …
Courses.txtCourseID, Title, Credits, Lecturer, EnrolledStudentIDs

  • Changing a student’s address requires editing Students.txt and every occurrence in Courses.txt.
  • Adding a new field (e.g. “Email”) means editing both files and recompiling every programme.
  • Finding all students in “Computer Science” needs custom code that reads Courses.txt, extracts IDs, then scans Students.txt for each ID.
  • Two clerks editing the same file simultaneously can overwrite each other’s changes.
  • Only the OS user who owns the file can read/write it; there is no way to give a clerk read‑only access.


3. Relational Database Features that Overcome These Limitations

3.1 Data Independence

  • Logical data independence – the logical schema (tables, columns, relationships) can be altered without changing application code. Example: ALTER TABLE Student ADD Email VARCHAR(100); does not require program modification.
  • Physical data independence – storage details (indexes, clustering, partitioning) can be changed without affecting programmes.
  • Views provide a stable virtual table that hides physical implementation details.

3.2 Reduced Redundancy & Normalisation

Data is stored once and related through keys. Normalisation removes duplication systematically.

Normal Forms – Quick Reference

Normal FormKey Requirement
1NFEliminate repeating groups; each field contains atomic values.
2NFBe in 1NF and have no partial dependency of non‑key attributes on a part of a composite primary key.
3NFBe in 2NF and have no transitive dependency of non‑key attributes on other non‑key attributes.

Step‑by‑step normalisation (Student‑Course example)

-- Unnormalised file (single table)

StudentID | Name | DOB | Course1 | Course2 | Course3

---------------------------------------------------------

1 | Alice | 2005‑04‑12 | CS101 | MA101 | NULL

2 | Bob | 2004‑09‑30 | CS101 | PH101 | MA101

  1. 1NF – eliminate repeating groups

    Create separate Student and Enrolment tables.

    Student

    StudentID | Name | DOB

    -------------------------

    1 | Alice | 2005‑04‑12

    2 | Bob | 2004‑09‑30

    Enrolment

    StudentID | CourseCode

    -----------------------

    1 | CS101

    1 | MA101

    2 | CS101

    2 | PH101

    2 | MA101

  2. 2NF – remove partial dependency

    The primary key of Enrolment is (StudentID, CourseCode); there are no non‑key attributes, so 2NF is satisfied.

  3. 3NF – remove transitive dependency

    Create a Course table for attributes that depend only on CourseCode.

    Course

    CourseCode | Title | Credits

    ------------------------------------

    CS101 | Computer Sci | 3

    MA101 | Mathematics | 3

    PH101 | Physics | 3

Result: three tables in 3NF, no duplicated student or course information.

3.3 Integrity Constraints

Built‑in rules that guarantee data quality.

  • Domain constraints – data type, length, range.

     DOB DATE,

    Credits INT CHECK (Credits BETWEEN 1 AND 6)

  • Primary‑key (PK) – unique identifier for each row.

    StudentID INT PRIMARY KEY

  • Foreign‑key (FK) – enforces referential integrity.

    FOREIGN KEY (StudentID) REFERENCES Student(StudentID)

  • UNIQUE – prevents duplicate values in a column.

    Email VARCHAR(100) UNIQUE

  • NOT NULL – forces a column to contain a value.

    Name VARCHAR(50) NOT NULL

  • CHECK – custom condition.

    CHECK (Credits BETWEEN 1 AND 6)

3.4 SQL – DDL and DML

SQL is a declarative language for defining structures (DDL) and manipulating data (DML).

DDL – creating the tables

CREATE TABLE Student (

StudentID INT PRIMARY KEY,

Name VARCHAR(50) NOT NULL,

DOB DATE,

Email VARCHAR(100) UNIQUE

);

CREATE TABLE Course (

CourseCode CHAR(5) PRIMARY KEY,

Title VARCHAR(100) NOT NULL,

Credits INT CHECK (Credits BETWEEN 1 AND 6)

);

CREATE TABLE Enrolment (

StudentID INT,

CourseCode CHAR(5),

PRIMARY KEY (StudentID, CourseCode),

FOREIGN KEY (StudentID) REFERENCES Student(StudentID),

FOREIGN KEY (CourseCode) REFERENCES Course(CourseCode)

);

DML – basic data manipulation

-- INSERT

INSERT INTO Student VALUES (1, 'Alice', '2005-04-12', 'alice@example.com');

INSERT INTO Course VALUES ('CS101', 'Computer Science', 3);

INSERT INTO Enrolment VALUES (1, 'CS101');

-- UPDATE

UPDATE Student SET Email = 'alice.new@example.com' WHERE StudentID = 1;

-- DELETE

DELETE FROM Enrolment WHERE StudentID = 1 AND CourseCode = 'CS101';

SELECT – most frequently examined clauses (one‑line purpose)

  • SELECT – list the columns to return.
  • FROM – specify the table(s) to read.
  • WHERE – filter rows.
  • JOIN (INNER, LEFT, etc.) – combine rows from two tables based on a condition.
  • GROUP BY – aggregate rows that share a value.
  • HAVING – filter groups after aggregation.
  • ORDER BY – sort the result set.

Example query (join, filter, sort)

SELECT s.Name, c.Title

FROM Enrolment e

JOIN Student s ON e.StudentID = s.StudentID

JOIN Course c ON e.CourseCode = c.CourseCode

WHERE c.Credits >= 3

ORDER BY s.Name;

Aggregate example

SELECT c.Title, COUNT(*) AS NumStudents

FROM Enrolment e

JOIN Course c ON e.CourseCode = c.CourseCode

GROUP BY c.Title

HAVING COUNT(*) > 1;

Mini‑exercise

Write a CREATE TABLE statement for a Library entity that stores:

  • BookID (primary key, integer)
  • Title (text, up to 200 characters, not null)
  • Author (text, up to 100 characters)
  • PublishedYear (integer, must be between 1500 and the current year)


3.5 Powerful Query Capability

  • SQL is declarative – the user states *what* data is required, the DBMS decides *how* to obtain it.
  • The query optimiser automatically chooses the best execution plan (join order, index use, predicate push‑down).

3.6 Concurrency Control & Transaction Management

  • ACID properties

    • Atomicity – a transaction is all‑or‑nothing.
    • Consistency – constraints remain satisfied after commit.
    • Isolation – concurrent transactions do not interfere; each sees a consistent snapshot.
    • Durability – committed changes survive crashes.

  • Locking protocol (Two‑Phase Locking) – a transaction obtains all required locks before releasing any, preventing dirty reads and lost updates.
  • Timestamp ordering – each transaction gets a timestamp; the DBMS orders conflicting operations to ensure serialisability.
  • Illustration of a lost‑update problem (without proper locking):

    -- User A reads Alice’s email (alice@example.com)

    -- User B reads the same value

    -- A updates to aliceA@example.com and commits

    -- B updates to aliceB@example.com and commits

    -- Final value = aliceB@example.com (A’s change lost)

    With 2PL, both users would acquire an exclusive lock on the row, forcing one to wait until the other commits.

  • Example transaction block:

    BEGIN TRANSACTION;

    UPDATE Student SET Email = 'bob@example.com' WHERE StudentID = 2;

    INSERT INTO Enrolment VALUES (2, 'MA101');

    COMMIT;

3.7 Security and Access Control

  • Granular privileges – grant specific rights on tables, columns or views.

    GRANT SELECT, INSERT ON Student TO clerk1;

  • Roles – a named set of privileges that can be assigned to many users.

    CREATE ROLE admin_role;

    GRANT ALL PRIVILEGES ON ALL TABLES TO admin_role;

    GRANT admin_role TO alice;

  • Views for logical security – expose only required columns/rows.

    CREATE VIEW StudentPublic AS

    SELECT StudentID, Name, DOB FROM Student;

    GRANT SELECT ON StudentPublic TO public_user;

3.8 Scalability and Performance Optimisation

  • Indexes – B‑tree or hash structures that accelerate search (e.g., index on Student.Email).
  • Query optimiser – rewrites SQL into an efficient execution plan (chooses join order, uses indexes, applies filters early).
  • Partitioning & clustering – spread large tables across disks or nodes, improving I/O and supporting very large data sets.
  • Materialised views – store pre‑computed results for frequently used aggregates.
  • Why scalability matters for the Cambridge exam: large tables are common in exam scenarios; knowing that indexes and partitioning improve performance helps students justify design choices.


4. Mapping Syllabus Features to DBMS Capabilities

Syllabus FeatureDBMS Capability that Demonstrates It
Logical data independenceViews and ALTER TABLE statements that do not require programme changes
Physical data independenceIndexes, clustering, partitioning – transparent to applications
Reduced redundancyNormalisation (1NF‑3NF) and use of primary/foreign keys
Integrity constraintsPRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK
SQL query languageSELECT, JOIN, GROUP BY, HAVING, sub‑queries – declarative retrieval
Transaction managementBEGIN…COMMIT, ACID properties, locking protocols
Security & access controlGRANT/REVOKE, roles, column‑level views
ScalabilityIndexes, query optimiser, partitioning, distributed DBMS


5. Comparison: File‑Based System vs Relational DBMS

AspectFile‑Based SystemRelational Database
Data redundancyHigh – same data duplicated in many filesLow – single‑table storage; normalisation removes duplication
Data integrityManual checks onlyBuilt‑in constraints (PK, FK, UNIQUE, NOT NULL, CHECK)
Query capabilityProcedural code requiredDeclarative SQL; joins, aggregates, ad‑hoc queries
Data independenceLow – schema changes force programme rewritesHigh – logical & physical independence via views, ALTER
Concurrency controlNone or simple file lockingACID transactions; sophisticated locking & timestamp protocols
SecurityOS file permissions onlyFine‑grained privileges, roles, column/row‑level views
ScalabilityLimited by file size and manual indexingIndexes, optimiser, partitioning, distributed DBMS


6. Entity‑Relationship (E‑R) Modelling

  • Entity – a thing of interest (e.g., Student, Course).
  • Attribute – a property of an entity (e.g., Name, DOB).
  • Relationship – an association between entities (e.g., Enrols linking Student and Course).
  • Cardinality – describes how many instances of one entity relate to another (1:1, 1:M, M:N).

Suggested ER diagram (Student‑Course‑Enrolment):

ER diagram showing Student (PK StudentID), Course (PK CourseCode) and Enrolment (PK StudentID+CourseCode) with relationships


7. Summary Checklist for Exam Revision

  • Know the six main limitations of file‑based systems.
  • Define a relational database, primary key, foreign key, and view.
  • Recall the key rule for 1NF, 2NF and 3NF (see the normal‑form table).
  • Be able to write basic DDL (CREATE TABLE) and DML (INSERT, UPDATE, DELETE, SELECT) statements.
  • Identify and give an example of each integrity constraint.
  • Explain ACID properties and give a simple example of a transaction.
  • State how security is implemented (GRANT, REVOKE, roles, views).
  • Describe why indexes, partitioning and the optimiser improve scalability.