Show understanding of the methods of file organisation and select an appropriate method of file organisation and file access for a given problem

Published by Patrick Mutisya · 14 days ago

Cambridge A-Level Computer Science 9618 – 13.2 File Organisation and Access

13.2 File Organisation and Access

Learning Objective

Show understanding of the methods of file organisation and select an appropriate method of file organisation and file access for a given problem.

Key Concepts

  • File organisation – the way records are stored on secondary storage.
  • File access – the method used to retrieve a record from a file.
  • Sequential (or linear) organisation.
  • Indexed organisation.
  • Direct (or random) organisation.
  • Hashed organisation.
  • Hybrid approaches (e.g., indexed sequential).

Methods of File Organisation

MethodDescriptionAdvantagesDisadvantagesTypical Use Cases
Sequential (Linear)Records are stored one after another in the order they are added.

  • Simple to implement.
  • Efficient for bulk loading and full‑file scans.

  • Slow random access – must read through preceding records.
  • Insertion/deletion costly (requires shifting).

Log files, batch processing, archival data.
IndexedAn auxiliary index file stores key values and pointers to the data records.

  • Fast lookup of individual records.
  • Supports range queries efficiently.

  • Extra storage for index.
  • Index must be maintained on updates.

Student databases, inventory systems.
Direct (Random)Record position is calculated from the key using a formula, e.g., \$p = (k \bmod m)\$.

  • Very fast single‑record access.
  • No separate index required.

  • Collisions require handling (overflow areas).
  • Works best when keys are uniformly distributed.

Fixed‑length records with known keys (e.g., employee ID).
HashedUses a hash function \$h(k)\$ to map a key \$k\$ to a bucket; collisions resolved by chaining or open addressing.

  • Average‑case constant‑time access.
  • Good for large, dynamic data sets.

  • Performance degrades with high load factor.
  • Hash function must be carefully chosen.

Cache tables, symbol tables, password storage.
Indexed Sequential (Hybrid)Combines sequential storage with an index; records are kept in order but an index speeds up searches.

  • Efficient for both range queries and point lookups.
  • Reduces need to scan entire file.

  • More complex to implement.
  • Requires periodic re‑organisation.

Banking transaction files, library catalogues.

File Access Methods

  • Sequential Access: Records are read in order from the beginning of the file. Suitable when processing every record.
  • Direct (Random) Access: Uses a calculated address or an index to jump directly to a record.
  • Indexed Access: Searches the index first, then retrieves the record using the pointer from the index.
  • Hashed Access: Applies a hash function to the key to locate the bucket containing the record.

Choosing an Appropriate Organisation & Access Method

When faced with a problem, consider the following questions:

  1. What is the primary operation? (search, insert, delete, bulk read)
  2. How many records are expected?
  3. Are records of fixed or variable length?
  4. Is the key uniformly distributed?
  5. Will range queries be required?
  6. What are the storage constraints?

Based on the answers, select the method that minimises the dominant cost.

Example Problem – Selecting a File Organisation

Problem statement: A university needs to store student records (StudentID, Name, Course, Year, GPA). The system must support:

  • Fast lookup of a single student by StudentID.
  • Frequent updates (add, delete, modify).
  • Range queries to list all students in a particular Course and Year.
  • Storage of up to 200 000 records, each of variable length (because of Name field).

Analysis

  • Primary key = StudentID (numeric, uniformly distributed).
  • Range queries on Course + Year suggest an index on those fields.
  • Variable‑length records make pure direct access difficult without a pointer table.
  • High update frequency requires an organisation that handles inserts/deletes efficiently.

Recommended Solution

  • Organisation: Indexed Sequential File.

    • Data records stored sequentially in a main file (allows efficient bulk scans).
    • Primary index on StudentID for fast point lookups.
    • Secondary index on (Course, Year) to support range queries.

  • Access Method: Indexed access using the primary index for single‑record retrieval; secondary index for range queries.
  • Handling \cdot ariable Length: Store a fixed‑size pointer (byte offset) in the index entries; the main file can use a simple linked‑list of free spaces for deletions.

Justification

  • Fast \$O(\log n)\$ lookup via primary index meets the requirement for quick StudentID searches.
  • Secondary index enables \$O(\log n + k)\$ retrieval of \$k\$ records matching a Course/Year range.
  • Sequential storage of the data file keeps insert/delete overhead low compared with pure direct organisation.
  • Indexes occupy additional space but are justified by the performance gains for the expected query mix.

Suggested diagram: Block diagram showing the main data file, primary index (StudentID → byte offset), and secondary index (Course+Year → list of offsets).

Summary Checklist

  • Identify the dominant operations.
  • Determine record size characteristics.
  • Choose between sequential, indexed, direct, or hashed based on access patterns.
  • Consider hybrid solutions when multiple query types are required.
  • Validate the choice against storage limits and update frequency.