Show understanding of the limitations of using a file-based approach for the storage and retrieval of data

Published by Patrick Mutisya · 14 days ago

Cambridge A-Level Computer Science – 8.1 Database Concepts

8.1 Database Concepts – Limitations of a File‑Based Approach

Why a File‑Based System Is Often Inadequate

In a file‑based system each application stores its data in its own set of files. While this may be simple for very small programs, several fundamental problems arise as the amount of data and the number of applications grow.

Key Limitations

  • Data Redundancy and Inconsistency – The same piece of information is often stored in multiple files. Updating one copy does not automatically update the others, leading to contradictory data.
  • Limited Data Access – Files are usually accessed sequentially. To find a particular record you may need to read the whole file, giving a time complexity of \$O(n)\$ where \$n\$ is the number of records.
  • Lack of Data Independence – Changes to the file structure (e.g., adding a new field) require changes to every program that reads the file.
  • Difficulty Enforcing Integrity – There is no built‑in mechanism to enforce primary‑key uniqueness, referential integrity, or domain constraints.
  • Concurrency Problems – Simultaneous access by multiple users can cause lost updates or corrupted files because file systems provide only basic locking.
  • Security and Access Control – File permissions are coarse‑grained; it is hard to restrict access to individual records.
  • Scalability and Performance – As the volume of data grows, file‑based searches become slower and backup/restore operations become cumbersome.
  • No Powerful Query Language – Extracting specific subsets of data requires custom code; there is no standardised language such as SQL.

Illustrative Example

Consider a student management system that stores student details in students.txt and enrolments in enrolments.txt. If a student changes their address, the address must be updated in both files. Failure to do so results in inconsistency.

Comparison Table

AspectFile‑Based ApproachDatabase Management System (DBMS)
Data RedundancyHigh – duplicate records across filesLow – data stored once, referenced via keys
Data Access SpeedLinear search (\$O(n)\$) for most operationsIndexed access (\$O(\log n)\$) and query optimisation
Data IntegrityManual checks requiredBuilt‑in constraints (primary key, foreign key, check)
Concurrency ControlBasic file locks; prone to deadlocksTransaction management (ACID properties)
ScalabilityPoor – performance degrades with sizeGood – supports large volumes and multiple users
Query CapabilityCustom code for each querySQL – powerful, declarative queries

Mathematical Perspective on Search Cost

For a file containing \$n\$ records, the average number of reads required to locate a specific record using sequential search is \$\frac{n+1}{2}.\$ In contrast, a balanced index (e.g., B‑tree) reduces the average reads to \$\log_{b} n\$ where \$b\$ is the branching factor.

Consequences for Real‑World Applications

  1. Inconsistent reports – financial statements derived from duplicated transaction files may not match.
  2. Data loss – a corrupted file can erase all copies of a particular data item.
  3. High maintenance cost – every change to file format forces a rewrite of all dependent programs.
  4. Limited decision support – without a query language, generating ad‑hoc analysis is time‑consuming.

Suggested Diagram

Suggested diagram: Comparison of a simple file‑based system (multiple independent files) with a DBMS architecture (single logical database, DBMS layer, and multiple applications).

Summary

A file‑based approach may be acceptable for very small, single‑user applications, but it quickly becomes untenable as data volume, user count, and complexity increase. The limitations—redundancy, poor access speed, lack of integrity enforcement, concurrency issues, and limited query capability—are precisely the problems that a Database Management System is designed to solve.