Cambridge Notes, Past Papers, Revision Questions

6.2 Data Integrity

Learning Objective (Syllabus 6.2)

Describe how data validation and data verification help protect the integrity of data.

What Is Data Integrity?

Data integrity is the accuracy, consistency and reliability of data throughout its life‑cycle. It means that data is not altered in an unauthorised or accidental way.

Why Integrity Matters (Link to 6.1 Data Security & 8 Databases)

Integrity checks work with authentication (who you are) and encryption (confidentiality) to prevent tampering.

In databases, integrity is enforced by constraints such as NOT NULL, CHECK, UNIQUE, and foreign‑key rules.

In networking, integrity ensures that packets received are exactly those that were sent.

Data Validation (AO1 – Knowledge)

Validation is performed before data is stored or processed. It stops incorrect or malformed data from entering the system, thereby preserving logical integrity.

Technique (syllabus wording)	What It Checks	Typical Example
Type checking	Value must be of the required data type.	Age must be an integer.
Range checking	Numeric value lies within a permitted interval.	Score 0 – 100.
Format checking	Value matches a pattern (regular expression).	Email address `user@example.com`.
Length checking	Number of characters/digits is within limits.	Password 8‑12 characters.
Presence / Mandatory‑field checking	Required fields cannot be left blank.	Customer name must be entered.
Existence checking	Referenced data already exists (foreign key).	Order must refer to an existing product ID.
Uniqueness checking	Value must be unique across records.	Username must not duplicate an existing one.
Limit checking	Restricts number of occurrences.	At most three active sessions per user.
Check‑digit verification	Calculated digit matches the supplied digit.	ISBN‑13 Luhn check.
Cross‑field validation	Logical relationship between two or more fields.	Start‑date must be earlier than end‑date.

Pseudocode Example (AO3 – Design/Implementation)

function isValidEmail(email):
pattern = r'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
if not match(pattern, email):
return false
if length(email) > 254:
return false
return true

This function demonstrates type, format and length checking for an email address.

Data Verification (AO1 – Knowledge)

Verification is performed after data has been stored or transmitted. It detects accidental corruption or deliberate tampering, thereby preserving physical integrity.

Technique (syllabus wording)	How It Works	Typical Use‑case
Parity bit	Single‑bit indicating whether the number of 1‑bits is even or odd.	Simple error detection in serial communication.
Checksum	Sum of all bytes modulo 256 (or 2ⁿ). Algorithm: `sum = 0; for each byte b in data: sum = (sum + b) mod 256; checksum = sum;`	Verifying log files or small data blocks.
CRC (Cyclic Redundancy Check)	Data treated as a binary polynomial; divided by a generator polynomial (e.g. 0x04C11DB7). The remainder of this division is the CRC value. A matching remainder on receipt means no error.	Network packets, disk sectors, USB transfers.
Cryptographic hash	Fixed‑size digest (e.g. SHA‑256). Any change to the input produces a dramatically different hash.	File‑integrity verification, version control.
Digital signature	Hash of the data encrypted with the sender’s private key; the receiver decrypts with the public key and compares hashes.	Legal documents, software distribution.
Message‑Authentication Code (MAC)	Hash‑based code generated using a secret key (e.g. HMAC‑SHA‑256). Guarantees integrity and authenticity.	Secure API messages, VPN tunnels.

Comparison of Validation and Verification

Aspect	Data Validation	Data Verification
When performed	At data entry, before storage or processing	After storage, during transmission or on retrieval
Primary purpose	Prevent bad or malformed data from entering the system	Detect accidental or malicious alteration of data
Typical techniques (syllabus wording)	Type, range, format, length, presence, existence, uniqueness, limit, check‑digit, cross‑field	Parity, checksum, CRC, cryptographic hash, digital signature, MAC
Where the checks are applied	Client‑side UI, server‑side logic, database constraints (NOT NULL, CHECK, UNIQUE, FOREIGN KEY)	File systems, communication protocols, backup/restore processes, secure logs
Effect on data integrity	Maintains logical integrity by ensuring only correct data is stored	Maintains physical integrity by detecting corruption or tampering

Floating‑Point Rounding Errors (Additional AO2 Insight)

Binary floating‑point numbers can only approximate many decimal fractions. This can cause small rounding errors that accumulate in calculations, e.g.

0.1 + 0.2  →  0.30000000000000004  (not exactly 0.3)

Validation cannot prevent such errors; verification (e.g., using a checksum of the result) can detect unexpected changes after processing.

Why Both Validation and Verification Are Needed (AO2 – Analysis)

Validation stops malformed inputs that could cause wrong calculations, security breaches or database constraint violations.

Verification catches corruption that occurs after data has been stored, backed‑up, or transmitted across a network.

Together they provide a defence‑in‑depth approach, reducing the risk of:
- Incorrect results caused by bad inputs.
- Data loss or silent corruption during backup, transmission, or hardware failure.
- Unauthorised modification of critical records.

Practical Implementation Tips (AO3 – Design/Implementation)

Validate on both client and server sides; never rely solely on client‑side checks.

Use built‑in database constraints (NOT NULL, CHECK, UNIQUE, foreign keys) as a second line of defence.

Select a verification method appropriate to the data’s sensitivity:
- Simple checksum for routine log files.
- CRC for network packets or large storage blocks.
- SHA‑256 or a digital signature for confidential or legally important records.
- MAC when both integrity and authentication are required.

Store verification values (checksums, hashes, signatures) in a secure, tamper‑evident location – e.g., a separate integrity‑log table.

Audit validation and verification failures regularly; patterns may reveal misuse, software bugs, or hardware problems.

Suggested Diagram

Flowchart showing two parallel streams:

Validation – Input → Client‑side checks → Server‑side checks → Database constraints.

Verification – Data stored → Integrity‑value generation (checksum/CRC/hash) → Backup/Transmission → Re‑calculation & comparison on receipt.

Icons for client, server, DB, CRC module, and digital‑signature verification are recommended.

Assessment Objective Alignment

AO1 – Knowledge: Definitions, terminology, and lists of validation and verification techniques (including uniqueness checking and MAC).

AO2 – Analysis: Comparison table, explanation of why both are required, and discussion of floating‑point rounding errors.

AO3 – Design/Implementation: Practical tips, database constraint examples, pseudocode snippet, and diagram suggestion.

Summary

Data integrity is essential for reliable computing. Data validation ensures that only correctly formatted, logically consistent data enters the system, protecting logical integrity. Data verification confirms that data remains unchanged after storage or transmission, protecting physical integrity. Using both techniques together gives a robust defence‑in‑depth strategy that supports accurate processing, trustworthy reporting, and strong security.

Describe how data validation and data verification help protect the integrity of data