Describe and use methods of data validation

6.2 Data Integrity – Methods of Data Validation

Learning Objective

Describe and use a range of data‑validation and data‑verification methods so that data stored, transmitted or processed is accurate, complete and reliable.

What is Data Integrity?

Data integrity is the correctness and consistency of data throughout its entire life‑cycle. It is protected by preventing accidental or intentional alteration of data.

Typical Sources of Data Errors

  • Human entry mistakes – typos, omitted fields, wrong format.
  • Transmission noise – electrical interference, signal loss.
  • Hardware faults – bad sectors, memory errors.
  • Software bugs – incorrect algorithms, overflow, truncation.

When to Validate Data

Validation can be performed at three distinct stages. Each stage has a different focus.

  1. Input validation – before data is accepted into the system.
  2. Processing validation – while data is being transformed or calculated.
  3. Output / storage validation – just before data is written to a file, database or transmitted.

Examples of the three stages

  • Input: When a user registers a new account, the system checks the e‑mail format, password length and that the chosen username is not already in use.
  • Processing: In a banking transaction the program verifies that the running total never exceeds the account’s credit limit and that debits equal credits (double‑entry check).
  • Output: Before a file is saved, a 16‑bit checksum is calculated and stored; the checksum is recomputed on reading to detect corruption.

Key Syllabus Terminology

TermDefinition (Cambridge AS/A‑Level)
Presence checkVerify that a required field is not blank (e.g., a student’s ID must be entered).
Existence checkConfirm that a value exists in a reference list or table (e.g., a course code must exist in the master Courses table).
Range checkEnsure a numeric value lies between a lower and upper limit (e.g., age 0–120).
Limit checkTest that a value does not exceed a defined maximum (e.g., total credits ≤ 180).
Check‑digitA single digit calculated from the other digits of a number to detect transcription errors (e.g., ISBN‑10).

Categories of Validation Techniques

1. Syntactic Validation (Structure of the data)

  • Presence check – field must contain a value.
  • Length check – e.g., a UK postcode must be exactly 6 characters.
  • Format / pattern check – regular expressions such as

    ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ for e‑mail addresses.

  • Data‑type check – numeric, alphabetic, valid date, etc.

2. Semantic Validation (Meaning of the data)

  • Range / limit check – age must be between 0 and 120; credits ≤ 180.
  • Cross‑field checkStart Date must be earlier than End Date.
  • Existence / referential integrity – a Course Code must exist in the master Courses table.
  • Check‑digit verification – calculate the check‑digit and compare with the supplied one (e.g., ISBN‑10).

3. Redundancy (Verification) Checks – Detect accidental errors

3.1 Parity Bit (required by the syllabus)

A single bit added so that the total number of 1‑bits in the word is even (even parity) or odd (odd parity).

Example – 8‑bit word with even parity:

Data: 10110010 (four 1‑bits → even)

Parity P = 0

Transmitted: 10110010 0

If bit 3 flips → 10010010 0 → three 1‑bits → parity check fails.

3.2 Additive Checksum (mod‑256) – exam‑style example

Sum all data bytes and take the remainder modulo 256 (8‑bit checksum).

Example – data bytes 0x12, 0xA7, 0x5C:

Sum = 0x12 + 0xA7 + 0x5C = 0x10D

Checksum C = 0x10D mod 256 = 0x0D

Transmitted block = 12 A7 5C 0D

If the second byte is corrupted to 0xA6, new sum = 0x10C → C = 0x0C ≠ 0x0D → error detected.

3.3 Optional Enrichment – CRC

CRC treats the data as a binary polynomial and appends the remainder after division by a generator polynomial. Useful for deeper study but not required for the exam.

3.4 Optional Enrichment – Cryptographic Hash (SHA‑256)

Produces a 256‑bit digest that changes dramatically with any alteration of the input. Used for tamper‑proof verification, but beyond the syllabus specification.

Comparison of Redundancy Methods

MethodTypical size (bits)Error‑detection capabilityCommon uses (exam relevance)
Parity bit1Detects any single‑bit error; fails for an even number of bit errorsMemory modules, simple serial links (syllabus focus)
Additive checksum (mod‑256)8Detects most single‑byte errors; limited against re‑ordering or multiple‑byte errorsFile‑transfer protocols, exam‑style questions
CRC‑16 / CRC‑32 (optional)16–32Detects all single‑bit errors, all double‑bit errors, and all burst errors up to the degree of the generator polynomialEthernet, USB, deeper study
SHA‑256 (optional)256Detects any change with probability ≈ 2⁻²⁵⁶Software distribution, digital signatures

Practical Validation Example (Student‑Record Database)

function validateStudentRecord(record):

# ---- 1. Syntactic (presence & format) ----

if record.id == "" :

return false, "Presence check failed – Student ID required"

if not matches(record.id, r'^[A-Z]{2}\d{4}$'):

return false, "Format check failed – ID must be two letters followed by four digits (e.g., AB1234)"

if not matches(record.email, r'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'):

return false, "Email format invalid"

# ---- 2. Semantic (range, limit, existence, check‑digit) ----

age = int(record.age)

if age < 0 or age > 120:

return false, "Range check failed – age must be 0‑120"

if record.totalCredits > 180:

return false, "Limit check failed – credits cannot exceed 180"

if record.startDate > record.endDate:

return false, "Cross‑field check failed – start date after end date"

if not existsInTable('Courses', record.courseCode):

return false, "Existence check failed – course code not found"

# check‑digit example (ISBN‑10 style)

calcDigit = (10 - (sum((i+1)*int(d) for i,d in enumerate(record.isbn[:-1])) % 10)) % 10

if calcDigit != int(record.isbn[-1]):

return false, "Check‑digit verification failed"

# ---- 3. Redundancy – additive checksum (mod‑256) ----

dataBytes = toBytes(record.id + record.name + record.age + record.email)

checksum = sum(dataBytes) % 256

if checksum != record.checksum:

return false, "Checksum mismatch – data may be corrupted"

return true, "Record valid"

Handling Validation Failures (AO2)

  • Clear, specific messages – tell the user exactly which field and which rule failed.
  • Prompt for re‑entry or editing – allow correction without losing other data.
  • Log the failure – timestamp, field, reason; useful for audit trails and spotting systematic problems.
  • Never rely on client‑side checks alone – repeat validation on the server or back‑end.

Testing & Maintenance of Validation Routines (AO3)

  • Write unit tests for each rule (boundary values for range checks, invalid formats for regexes, wrong check‑digit, altered checksum).
  • Keep validation code separate from business logic so it can be updated when rules change.
  • Automate regression testing; a failed test after a change signals a broken validation rule.

Assessment‑Focused Tasks

Task 1 – AO2 (Analysis)

Given the data‑entry form below, list three validation checks that must be performed before the form can be accepted. Justify your choice in terms of the syllabus terminology (e.g., presence, range, existence).

Student ID: _ (required, format AA1234)

Name: _ (required)

Date of birth: //

Course code: _ (must exist in Courses table)

Credits earned: _

Mark‑scheme (excerpt)

CriterionMarks
Presence check for Student ID and Name1
Format check for Student ID (AA1234)1
Existence check for Course code1
Range/limit check for Credits (0‑180)1
Clear justification linking each check to the syllabus term1

Task 2 – AO3 (Design & Implementation)

Write pseudocode (or a short program fragment) that calculates an 8‑bit additive checksum for a record consisting of the fields StudentID, Name and Age. The checksum must be stored with the record and later verified.

Mark‑scheme (excerpt)

CriterionMarks
Correct conversion of fields to bytes1
Summation of all bytes1
Modulo‑256 operation1
Storing the checksum with the record1
Verification step (re‑calculate and compare)1

Suggested Diagram

Flowchart: Input → Presence/Format checks → Range/Existence checks → Check‑digit / Redundancy check → Store/Transmit

Key Take‑aways

  • Data integrity is essential for reliable computing systems.
  • Syntactic validation uses presence, length, format and type checks.
  • Semantic validation uses range, limit, cross‑field, existence and check‑digit checks.
  • Parity bits and additive checksums satisfy the required syllabus techniques; CRC and cryptographic hashes are useful extensions.
  • Effective validation combines early input checks, processing‑stage safeguards, output verification, clear error handling, and thorough testing (AO1‑AO3).