8.1 Database Concepts – Limitations of a File‑Based Approach
Why a File‑Based System Is Often Inadequate
In a file‑based system each application stores its data in its own set of files. While this may be simple for very small programs, several fundamental problems arise as the amount of data and the number of applications grow.
Key Limitations
Data Redundancy and Inconsistency – The same piece of information is often stored in multiple files. Updating one copy does not automatically update the others, leading to contradictory data.
Limited Data Access – Files are usually accessed sequentially. To find a particular record you may need to read the whole file, giving a time complexity of \$O(n)\$ where \$n\$ is the number of records.
Lack of Data Independence – Changes to the file structure (e.g., adding a new field) require changes to every program that reads the file.
Difficulty Enforcing Integrity – There is no built‑in mechanism to enforce primary‑key uniqueness, referential integrity, or domain constraints.
Concurrency Problems – Simultaneous access by multiple users can cause lost updates or corrupted files because file systems provide only basic locking.
Security and Access Control – File permissions are coarse‑grained; it is hard to restrict access to individual records.
Scalability and Performance – As the volume of data grows, file‑based searches become slower and backup/restore operations become cumbersome.
No Powerful Query Language – Extracting specific subsets of data requires custom code; there is no standardised language such as SQL.
Illustrative Example
Consider a student management system that stores student details in students.txt and enrolments in enrolments.txt. If a student changes their address, the address must be updated in both files. Failure to do so results in inconsistency.
Comparison Table
Aspect
File‑Based Approach
Database Management System (DBMS)
Data Redundancy
High – duplicate records across files
Low – data stored once, referenced via keys
Data Access Speed
Linear search (\$O(n)\$) for most operations
Indexed access (\$O(\log n)\$) and query optimisation
For a file containing \$n\$ records, the average number of reads required to locate a specific record using sequential search is \$\frac{n+1}{2}.\$ In contrast, a balanced index (e.g., B‑tree) reduces the average reads to \$\log_{b} n\$ where \$b\$ is the branching factor.
Consequences for Real‑World Applications
Inconsistent reports – financial statements derived from duplicated transaction files may not match.
Data loss – a corrupted file can erase all copies of a particular data item.
High maintenance cost – every change to file format forces a rewrite of all dependent programs.
Limited decision support – without a query language, generating ad‑hoc analysis is time‑consuming.
Suggested Diagram
Suggested diagram: Comparison of a simple file‑based system (multiple independent files) with a DBMS architecture (single logical database, DBMS layer, and multiple applications).
Summary
A file‑based approach may be acceptable for very small, single‑user applications, but it quickly becomes untenable as data volume, user count, and complexity increase. The limitations—redundancy, poor access speed, lack of integrity enforcement, concurrency issues, and limited query capability—are precisely the problems that a Database Management System is designed to solve.