Published by Patrick Mutisya · 14 days ago
Describe and use methods of data validation to ensure that data stored, transmitted or processed is accurate, complete and reliable.
Data integrity refers to the correctness and consistency of data over its entire lifecycle. It is maintained by preventing accidental or intentional alteration of data.
Validation can be performed at three main stages:
Checks that the data conforms to a required format.
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$.Ensures the data makes sense in the real‑world context.
Use extra bits or values that can be recomputed to detect errors.
A single bit added to a binary word to make the number of 1’s either even (even parity) or odd (odd parity).
Even parity condition:
\$P = b1 \oplus b2 \oplus \dots \oplus b_n\$
where \$P\$ is the parity bit and \$\oplus\$ denotes XOR. If the total number of 1’s is odd, \$P\$ is set to 1 to make it even.
A checksum is the sum of all data bytes, often reduced modulo \$2^8\$ (one byte) or \$2^{16}\$ (two bytes).
Example for an 8‑bit checksum:
\$C = \left(\sum{i=1}^{N} di\right) \bmod 256\$
When the data is received, the receiver recomputes \$C\$ and compares it with the transmitted checksum.
CRC treats the data as a binary polynomial \$D(x)\$ and divides it by a generator polynomial \$G(x)\$. The remainder \$R(x)\$ is appended to the data.
Transmission packet: \$T(x) = D(x) \cdot x^{k} + R(x)\$ where \$k\$ is the degree of \$G(x)\$.
On receipt, the receiver checks that \$T(x) \bmod G(x) = 0\$. If not, an error is detected.
Functions such as SHA‑256 produce a fixed‑length digest that changes dramatically with any alteration of the input.
Used for integrity verification of files, software distribution and database records.
| Method | Typical Size (bits) | Error Detection Capability | Common Uses |
|---|---|---|---|
| Parity Bit | 1 | Detects any single‑bit error; fails for even number of bit errors | Memory modules, simple serial links |
| Checksum | 8–16 | Detects most single‑byte errors; limited against reordering | File transfer protocols (e.g., TCP optional checksum) |
| CRC | 16–32 | Detects all single‑bit errors, all double‑bit errors, all odd‑length burst errors up to degree of \$G(x)\$ | Ethernet, USB, storage devices |
| Hash (SHA‑256) | 256 | Detects any change with probability \$2^{-256}\$ | Software distribution, digital signatures, database integrity |
function validateStudentRecord(record):
# 1. Syntactic checks
if not matches(record.id, r'^[A-Z]{2}\d{4}$'):
return false, "Invalid ID format"
if not isNumeric(record.age):
return false, "Age must be numeric"
# 2. Semantic checks
age = int(record.age)
if age < 0 or age > 120:
return false, "Age out of realistic range"
# 3. Cross‑field check
if record.startDate > record.endDate:
return false, "Start date after end date"
# 4. Referential integrity
if not existsInTable('Courses', record.courseCode):
return false, "Course code does not exist"
# 5. Redundancy check (simple checksum)
dataBytes = toBytes(record.id + record.name + record.age)
checksum = sum(dataBytes) % 256
if checksum != record.checksum:
return false, "Checksum mismatch"
return true, "Record valid"