Cambridge A-Level Computer Science 9618 – Data Integrity
6.2 Data Integrity
Learning Objective
Describe how data validation and data verification help protect the integrity of data.
What is Data Integrity?
Data integrity refers to the accuracy, consistency, and reliability of data throughout its lifecycle. It ensures that data is not altered in an unauthorized or accidental manner.
Key Concepts
Data \cdot alidation – Checks that data entered or received conforms to required formats, ranges, or business rules before it is stored.
Data \cdot erification – Confirms that data stored or transmitted is exactly the same as the original source, often using checksums, hashes, or parity bits.
Data \cdot alidation
Validation is performed at the point of data entry (e.g., user input forms, file imports) and can be implemented in both client‑side and server‑side code.
Type checking – ensure the data type matches expectations (e.g., integer, string).
Range checking – verify numeric values fall within permitted limits.
Format checking – use regular expressions for patterns such as email addresses or dates.
Mandatory fields – enforce that required fields are not left blank.
Cross‑field validation – ensure logical consistency between related fields (e.g., start date must be before end date).
Data \cdot erification
Verification is typically performed after data has been stored or transmitted, to detect accidental corruption or intentional tampering.
Parity bits – simple error‑detecting code for binary data.
Checksums – additive functions that produce a small numeric value from a larger data set.
Cryptographic hashes – produce a fixed‑size digest (e.g., SHA‑256) that changes dramatically with any alteration of the input.
Digital signatures – combine a hash with a private key to provide both integrity and authentication.
Example: Checksum Calculation
A basic 8‑bit checksum can be calculated as the sum of all data bytes modulo 256:
\$C = \left( \sum{i=1}^{n} di \right) \bmod 256\$
When the data is received, the same calculation is performed and compared with the transmitted checksum \$C\$. A mismatch indicates corruption.
Comparison of \cdot alidation and \cdot erification
Aspect
Data \cdot alidation
Data \cdot erification
When performed
At data entry or before storage
After storage or transmission
Primary purpose
Prevent incorrect or malformed data from entering the system
Detect accidental or malicious alteration of stored/transmitted data
File systems, communication protocols, backup/restore processes
Effect on data integrity
Prevents bad data from being stored → maintains logical integrity
Detects corruption → maintains physical integrity
Why Both Are Needed
Validation and verification complement each other. Validation stops bad data from entering the system, while verification ensures that data that has been stored or transmitted remains unchanged. Using both reduces the risk of:
Incorrect calculations caused by malformed inputs.
Data loss or corruption during backup, network transfer, or hardware failure.
Security breaches where an attacker modifies data without detection.
Practical Implementation Tips
Implement validation on both client and server sides – never rely solely on client‑side checks.
Use built‑in database constraints (e.g., NOT NULL, CHECK, UNIQUE) as a second line of defense.
Choose verification methods appropriate to the data’s sensitivity – simple checksums for routine files, cryptographic hashes for critical records.
Store verification values (checksums, hashes) in a secure, tamper‑evident location.
Regularly audit logs for validation and verification failures to identify patterns of misuse.
Suggested Diagram
Suggested diagram: Flowchart showing where validation occurs (input → application → database) and where verification occurs (storage → backup → transmission → receipt).
Summary
Data integrity is a cornerstone of reliable computing systems. Data validation ensures that only correctly formatted and logically consistent data enters the system, while data verification confirms that stored or transmitted data remains unchanged. Together, they protect both the logical and physical integrity of information, supporting accurate processing, trustworthy reporting, and robust security.