File organisation describes how records are stored on secondary storage (disk, tape, SSD, etc.).
The chosen organisation determines the speed and method by which a program can retrieve, update or delete a record.
The Cambridge 9618 syllabus expects students to understand four main access methods, the situations in which each is appropriate, and the practical issues such as file type, record layout, error handling and hashing details.
Two fundamental file types are used in programming:
readline() or gets() are used.read(), write() or fseek() allow direct positioning.The choice of text or binary influences the access method that can be used efficiently and the way errors (e.g., end‑of‑file, read‑fail) must be handled.
open call – abort or request a retry if the file cannot be opened.read, test for EOF or a read‑error flag.OPEN file FOR READ
IF NOT SUCCESS THEN
DISPLAY "Cannot open file"
STOP
END IF
WHILE NOT EOF(file) DO
READ record
IF READ_ERROR THEN
DISPLAY "Read error – abort"
STOP
END IF
IF record.id = targetID THEN
DISPLAY record
EXIT LOOP
END IF
END WHILE
CLOSE file
Direct access is possible when every record occupies the same number of bytes (fixed‑length) and the file is opened in binary mode.
\[
\text{byte\address} = \text{base\address} + r \times \text{record\_size}
\]
where r is the zero‑based record number.
seek and read operations.recordSize = 128 // bytes
target = 57 // 58th record (0‑based)
address = baseAddress + target*recordSize
IF NOT SEEK(file, address) THEN
DISPLAY "Seek failed – invalid record number"
STOP
END IF
READ record
IF READ_ERROR THEN
DISPLAY "Read error"
STOP
END IF
DISPLAY record
An index is a separate, ordered structure that stores pairs (key, address).
The index is searched first; the resulting address is then used for a direct read of the actual record.
# index is a sorted array of (key, address) pairs
pos = binarySearch(index, targetKey)
IF pos = NOT_FOUND THEN
DISPLAY "Record not found"
STOP
END IF
address = index[pos].address
IF NOT SEEK(dataFile, address) THEN
DISPLAY "Invalid address in index"
STOP
END IF
READ record
IF READ_ERROR THEN
DISPLAY "Read error"
STOP
END IF
DISPLAY record
Hashing converts a search key into a location using a hash function. The location is usually a bucket in a hash table stored on disk.
0 … (B‑1), where B is the number of buckets.0.5 ≤ α ≤ 0.75.function hash(key):
return (key MOD bucketCount)
bucket = hash(targetKey)
FOR each entry IN bucketList[bucket] DO
IF entry.key = targetKey THEN
SEEK dataFile TO entry.address
READ record
DISPLAY record
EXIT
END IF
END FOR
DISPLAY "Record not found"
function hash(key):
return (key MOD bucketCount)
function probe(key, i):
return (hash(key) + i) MOD bucketCount # linear probing
i = 0
bucket = probe(targetKey, i)
WHILE bucket NOT EMPTY DO
IF bucket.key = targetKey THEN
SEEK dataFile TO bucket.address
READ record
DISPLAY record
EXIT
END IF
i = i + 1
bucket = probe(targetKey, i)
END WHILE
DISPLAY "Record not found"
0 … B‑1.| Method | Typical use‑case | Search complexity | Record‑size requirement | Storage overhead |
|---|---|---|---|---|
| Sequential | Full‑file scans (e.g., payroll, batch reports) | O(n) | Fixed or variable (text or binary) | None |
| Random (Direct) | Look‑ups by record number or offset | O(1) | Fixed length (binary) | None |
| Indexed | Searches by key value (e.g., customer ID, employee number) | O(log m) | Fixed or variable | Index blocks ≈ 5‑10 % of data |
| Hash‑based | Very fast look‑ups where a good hash function exists; frequent inserts | ≈ O(1) average (depends on load factor) | Fixed or variable | Hash table + possible overflow structures |
When a problem statement is given, evaluate the following characteristics and match them to the most suitable method.
| Characteristic | Recommended method(s) |
|---|---|
| All records must be processed (e.g., generate a report) | Sequential access (text or binary) |
| Records are of identical size and are addressed by position (e.g., log entry number) | Random (direct) access |
| Frequent look‑ups by a unique key; records may be variable length | Primary index (or secondary index if non‑unique) |
| Look‑ups by a non‑unique attribute with many matches (e.g., surname) | Secondary index (points to a list) or hash with chaining |
| Very high insert/delete rate; keys are known and a good hash function exists | Hash‑based access (choose open addressing for low‑memory, chaining for simplicity) |
| File must be portable across platforms and human‑readable | Text file + sequential access (or index stored separately as a text file) |
| Storage space is at a premium | Direct access (no extra structures) or carefully sized hash table |
0 … fileSize‑1. If seek fails, re‑initialise the file pointer or rebuild the index/hash.IOError, FileNotFoundException etc., and provide a user‑friendly fallback.13.1 User‑Defined Data Types
Custom types (enumerated, record/structure, class) let programmers group related fields – essential when designing the record layout for any of the access methods above. See the “User‑Defined Data Types” notes for examples in Python, Java and Pseudocode.
13.3 Floating‑Point Representation
Understanding IEEE‑754 binary floating‑point is required for AO2 questions that involve numerical data stored in files. The representation influences rounding error and storage size (single‑ vs double‑precision). Refer to the “Floating‑Point Representation” section for details.
12.3 Program Testing & Maintenance
After choosing an access method, appropriate testing (unit, integration, black‑box) must be planned. A simple test plan for a file‑handling module is provided in the “Testing & Maintenance” notes.

Your generous donation helps us continue providing free Cambridge IGCSE & A-Level resources, past papers, syllabus notes, revision questions, and high-quality online tutoring to students across Kenya.