Understand why data is stored in external files and how programs can access them

File Handling – Cambridge IGCSE 0478

1. Why Store Data Outside a Program?

  • Persistence – data remains after the program stops.
  • Limited main‑memory – large volumes cannot be kept in RAM.
  • Sharing – several programs or users can read/write the same file.
  • Backup & recovery – files can be copied, archived or restored.
  • Security & permissions – the OS controls who can read or write a file.

2. Types of Files

  • Text files – store characters using a readable encoding (ASCII, UTF‑8, …). Opened without the b flag.
  • Binary files – store data exactly as it is used internally (images, audio, compiled objects). Must be opened with a trailing b (e.g. rb, wb) in **all** languages.

3. The Four‑Step Scaffold for Using a File

Cambridge expects candidates to be able to open, read/write, and close a file, handling errors where necessary. The steps below follow the order in which an exam‑style algorithm is written.

  1. Open the file – obtain a file handle and choose the correct access mode.
  2. Read or write data – single items, whole lines, or blocks of data.
  3. Position the file pointer (optional) – use SEEK / TELL for random (direct) access.
  4. Close the file – release the handle and flush any buffered output.

3.1 Opening a File (exam‑style pseudocode)

OPEN "students.txt" AS studentFile MODE "r+"   // r+, w, a, etc.
IF studentFile = NIL THEN
    DISPLAY "Cannot open file – check name and permissions"
    STOP
END IF

The syntax above matches the Cambridge pseudocode format OPEN <filename> AS <handle> MODE <mode>.

3.2 Reading and Writing Data

Single items
READ studentID FROM studentFile          // reads the next item (e.g. an integer)
WRITE newGrade TO studentFile            // writes a single item
Whole lines of text
READLINE line FROM studentFile           // reads up to the next line‑break
WRITELINE line TO studentFile            // writes the line and adds a line‑break
Blocks of data (binary)
READBLOCK buffer[100] FROM imageFile     // reads 100 bytes into buffer
WRITEBLOCK buffer[100] TO imageFile      // writes 100 bytes from buffer

3.3 Positioning the File Pointer

  • SEEK(handle, offset, origin) – moves the pointer to offset bytes from origin (BEGINNING, CURRENT, or END).
  • TELL(handle) – returns the current byte position; useful for calculating offsets in random access.

Example (random access to a fixed‑length record):

offset ← (recordNumber‑1) * recordLength
SEEK studentFile, offset, BEGINNING
READ record FROM studentFile

3.4 Closing a File

CLOSE studentFile

In real languages this is often done with try … finally or a with statement to guarantee the file is closed even if an error occurs.

4. File‑Access Modes – What They Do & When to Use Them

Mode Description Typical Use‑case (most suitable)
r Read‑only. File must already exist. Reading a configuration or data file.
w Write‑only. Creates a new file or truncates an existing one. Saving a final report that replaces any old version.
a Append‑only. Writes are added to the end of the file. Log files where old entries must be preserved.
r+ Read‑write. File must exist; pointer starts at the beginning. Updating a specific record in a data file.
w+ Read‑write. Creates a new file or truncates an existing one. Temporary files that will be rewritten during the program.
a+ Read‑append. Reads from start; writes are always appended. Reading a log while still adding new entries.
rb, wb, ab, … Binary equivalents of the text modes (add b). Working with images, audio, or any non‑text data.

5. Sequential vs Random (Direct) Access

  • Sequential access – records are processed in order, one after another. Simple to code but can be slow when only a few records are needed.
  • Random (direct) access – the file pointer can be moved to any byte position with SEEK(). Allows immediate read/write of a particular record.

Example: A school database contains 10 000 student records, each 32 bytes long. To display the record for student #7 532, compute the offset 7 531 × 32 = 240 992 bytes and SEEK directly to that position.

6. End‑of‑File (EOF) Detection

When reading sequentially, the program must know when the file ends.

  • Pseudocode (Cambridge style): WHILE NOT EOF(handle) DO … END WHILE
  • Python: for line in file: automatically stops at EOF.
  • C: while (fgets(buf, size, fp) != NULL) – the function returns NULL at EOF.

7. Error Detection & Handling

Common errors
  • File not found – occurs when opening in r mode and the file does not exist.
  • Permission denied – the user lacks read or write rights.
  • Disk full – cannot write more data.
  • Read/write error – hardware or OS problem.

Good practice: always close the file, even if an error occurs. In many languages this is done with a try … finally (or with) construct.

Example: Pseudocode with error handling

TRY
    OPEN "students.txt" AS f MODE "r+"
    IF f = NIL THEN
        DISPLAY "Cannot open file – check name and permissions"
        STOP
    END IF

    WHILE NOT EOF(f) DO
        READLINE line FROM f
        fields ← SPLIT(line, ",")
        IF fields[0] = targetID THEN
            fields[3] ← newGrade
            SEEK f, -LENGTH(line), CURRENT   // back to start of this line
            WRITELINE JOIN(fields, ",") TO f
            EXIT WHILE
        END IF
    END WHILE
FINALLY
    IF f ≠ NIL THEN CLOSE f END IF
END TRY

8. Worked Example – Updating a Student Record (Pseudocode)

Step‑by‑step algorithm that follows the four‑step scaffold.

  1. Open the file in read‑write mode.
  2. Loop through the file line by line until the required ID is found.
  3. Modify the required field, move the pointer back to the start of that line, and overwrite it.
  4. Close the file.
OPEN "students.txt" AS stuFile MODE "r+"
IF stuFile = NIL THEN
    DISPLAY "Error opening file"
    STOP
END IF

found ← FALSE
WHILE NOT EOF(stuFile) DO
    READLINE line FROM stuFile
    fields ← SPLIT(line, ",")
    IF fields[0] = targetID THEN
        fields[3] ← newGrade                 // change the grade field
        SEEK stuFile, -LENGTH(line), CURRENT // back to start of this line
        WRITELINE JOIN(fields, ",") TO stuFile
        found ← TRUE
        EXIT WHILE
    END IF
END WHILE

IF NOT found THEN
    DISPLAY "Student ID not in file"
END IF

CLOSE stuFile

9. Worked Example – Simple Python Logging (Text File)

import datetime

# The “with” statement guarantees the file is closed even on error
with open("log.txt", "a", encoding="utf-8") as log_file:
    log_file.write(f"Program started at {datetime.datetime.now()}")

10. Exam‑Style Decision‑Making Exercise (AO3)

For each scenario choose the *most appropriate* file‑access mode and give a brief justification (1‑2 sentences).

  1. Saving a final exam mark‑sheet that must replace any previous version.
    Answer: w. It creates a new file (or truncates an existing one) so the old mark‑sheet is overwritten.
  2. Recording every user action in a system so that old entries are never lost.
    Answer: a. Append mode adds new records to the end without disturbing existing data.
  3. Reading a configuration file at start‑up, then updating a single setting while the program runs.
    Answer: r+. The file must exist; read‑write access lets the program read the config and later rewrite the changed line.
  4. Storing a collection of photographs that will be displayed but never altered.
    Answer: rb. Binary read‑only mode prevents accidental modification and ensures the exact byte pattern of each image is preserved.

11. Key Points to Remember

  • Every file operation follows the four‑step scaffold: open → read/write → (optional) seek → close.
  • Use the correct access mode; w will erase existing data, a will never overwrite, and r+ is needed for updating existing records.
  • Check that OPEN succeeded (handle ≠ NIL) before attempting any read or write.
  • Detect EOF with WHILE NOT EOF(handle) (or language‑specific equivalents) to avoid infinite loops.
  • Binary files always require the b flag in every language.
  • Sequential access is simple; random access with SEEK/TELL is faster when only a few records are needed.
  • When the exam asks for storage requirements, use
    Fixed‑length records: \(S = n \times L\)
    Variable‑length records: \(S = \displaystyle\sum_{i=1}^{n} L_i\)
  • Always close a file, preferably with a try … finally or with construct, to ensure data is flushed and resources are released.

Create an account or Login to take a Quiz

42 views
0 improvement suggestions

Log in to suggest improvements to this note.