Cambridge Notes, Past Papers, Revision Questions

6.2 Data Integrity – Verification Methods for Data Entry and Data Transfer

Objective

Explain and apply the techniques used to verify data when it is entered into a system or transferred between systems, and understand how these techniques support the wider goals of security and privacy.

1. Context – Security, Privacy and Integrity

Security: protects data from unauthorised access or modification (e.g., passwords, firewalls).

Privacy: ensures that only authorised people can view the data (e.g., encryption, access‑control lists).

Integrity: guarantees that data are accurate, complete and unchanged except by authorised actions.

Verification methods are the primary tools used to detect accidental or malicious alteration of data.

In the Cambridge syllabus the focus of 6.2 is on integrity, but integrity checks are also a prerequisite for many security mechanisms such as digital signatures and intrusion‑detection alerts.

2. Data‑validation Methods (at the point of entry)

These checks are performed before data are stored or transmitted. They correspond to the typical “validation” questions in the exam.

Validation type	What it checks	Typical exam‑style example
Format check	String follows a required pattern (e.g., DD/MM/YYYY).	Validate a date entered as `31/02/2023`.
Range check	Numeric value lies within defined limits.	Age must be between 5 and 120 years.
Presence (mandatory) check	Required field is not left blank.	Student name cannot be empty.
Check‑digit algorithm	Mathematical test on a numeric identifier (e.g., Luhn, ISBN).	Validate a credit‑card number using the Luhn algorithm.

Worked examples (pseudo‑code)

2.1 Date format validation (DD/MM/YYYY)


function isValidDate(dateStr):
parts = split(dateStr,'/')               # [day, month, year]
if length(parts) ≠ 3: return false
day   = int(parts[0])
month = int(parts[1])
year  = int(parts[2])
if year < 1900 or year > 2100: return false
if month < 1 or month > 12:   return false
daysInMonth = [31,28,31,30,31,30,31,31,30,31,30,31]
# leap‑year adjustment
if month == 2 and ((year % 4 == 0 and year % 100 ≠ 0) or (year % 400 == 0)):
maxDay = 29
else:
maxDay = daysInMonth[month‑1]
return (day ≥ 1) and (day ≤ maxDay)

2.2 Range check – age


function isValidAge(age):
return (age ≥ 5) and (age ≤ 120)

2.3 Luhn check‑digit (e.g., student ID)


function luhnCheck(id):
# id – string of 8 digits, last digit is the check digit
if length(id) ≠ 8: return false
sum = 0
for i from 0 to 6:                     # process first 7 digits
digit = int(id[i])
# double every second digit from the right
if ( (6‑i) % 2 ) == 0:
digit = digit * 2
if digit > 9: digit = digit – 9
sum = sum + digit
check = (10 – (sum % 10)) % 10
return check == int(id[7])

3. Data‑verification Methods (during transfer)

When data move between devices they are sent in blocks. Each block carries extra bits that the receiver recomputes to confirm integrity.

Method	Typical errors detected	Common use	Why choose it? (error type / overhead)
Parity bit (even/odd)	Any single‑bit error in a byte.	Keyboard interfaces, simple serial links (RS‑232).	Very low overhead (1 extra bit); cannot detect multiple‑bit errors.
Simple additive checksum	Random single‑ or multi‑bit errors; fails for reordered data.	Early file‑transfer utilities, tape storage.	1 byte overhead; easy to implement.
Internet checksum (one’s‑complement 16‑bit sum)	Most single‑ and double‑bit errors in TCP/UDP headers.	IP, TCP, UDP protocols.	16 bits overhead; already built into the protocol stack.
Longitudinal Redundancy Check (LRC – 2‑D parity)	Errors affecting rows or columns of a rectangular block.	Modem protocols, X.25‑type networks.	Detects burst errors across a block; modest overhead (row + column parity).
Cyclic Redundancy Check (CRC‑8/16/32)	Burst errors up to the length of the generator polynomial.	Ethernet, USB, storage media, ZIP files.	Very high detection rate; 8–32 bits overhead; slightly more CPU work.
Cryptographic hash (MD5, SHA‑1, SHA‑256)	Any change to the data (even a single bit) produces a different hash.	Software distribution, version control, secure backups.	Strong integrity + authentication; larger overhead (128–256 bits) and higher computational cost.

3.1 How the methods work (concise pseudocode)

Parity bit (even parity)


function addParity(byte):
parity = 0
for i = 0 to 7:
parity = parity XOR ((byte >> i) & 1)
return (byte << 1) | parity          # 9‑bit word (stored in 2 bytes)
function checkParity(word):
data   = word >> 1
parity = word & 1
computed = 0
for i = 0 to 7:
computed = computed XOR ((data >> i) & 1)
return computed == parity

Simple additive checksum (8‑bit)


function checksum(block):
sum = 0
for each byte in block:
sum = (sum + byte) & 0xFF          # keep low 8 bits
return sum

Internet checksum (one’s‑complement 16‑bit)


function internetChecksum(data):
sum = 0
for i = 0 to len(data)-1 step 2:
word = (data[i] << 8) + data[i+1]   # 16‑bit word
sum = sum + word
# add carry back into low 16 bits
while sum > 0xFFFF:
sum = (sum & 0xFFFF) + (sum >> 16)
return ~sum & 0xFFFF

Longitudinal Redundancy Check (LRC)


function addLRC(block[8]):                 # 8‑byte block
rowParity = 0
colParity[8] = {0}
for r = 0 to 7:
rowParity = rowParity XOR block[r]
for c = 0 to 7:
bit = (block[r] >> c) & 1
colParity[c] = colParity[c] XOR bit
# build column‑parity byte
lrcByte = 0
for c = 0 to 7:
lrcByte = lrcByte | (colParity[c] << c)
return block + [rowParity, lrcByte]    # two extra bytes

Cyclic Redundancy Check – CRC‑32 (simplified)


function crc32(data):
crc = 0xFFFFFFFF
for each byte in data:
crc = crc XOR (byte << 24)
for i = 0 to 7:
if (crc & 0x80000000) ≠ 0:
crc = (crc << 1) XOR 0x04C11DB7
else:
crc = crc << 1
crc = crc & 0xFFFFFFFF          # keep 32 bits
return crc XOR 0xFFFFFFFF

3.2 Error handling (common to all methods)

If the computed value does not match the transmitted value, the receiver rejects the block.

Typical response: send a NAK (negative acknowledgement) and request retransmission; log the incident; after repeated failures raise an alert (e.g., IDS warning of a noisy line).

4. Linking Verification to Security & Privacy

Integrity checks are a prerequisite for authentication: a digital signature is a hash of the data encrypted with the sender’s private key.

Before decryption, a system often verifies a hash to ensure the ciphertext has not been tampered with – protecting confidentiality.

File‑system utilities (md5sum, sha256sum) use cryptographic hashes to detect unauthorised modification, supporting both integrity and privacy policies.

5. Choosing the Appropriate Verification Method (Analysis Task)

Scenario: A school wants to transfer student records (≈ 2 MB per file) over a wired Ethernet LAN to a backup server. The network is reliable, but occasional burst errors have been observed.

Task (AO2): Choose the most suitable verification method and justify your choice.

Recommended method: CRC‑32.

Justification: CRC‑32 detects burst errors up to 32 bits, which matches the observed error pattern. The overhead (4 bytes per block) is negligible for a 2 MB file, and modern NICs compute CRC‑32 in hardware, so performance impact is minimal.

6. Practical Implementation Checklist

Identify every point where data are entered or transmitted (forms, file I/O, network sockets).

For each point decide which validation/verification technique is required:
- Parity – very low‑cost, only single‑bit errors.
- Checksum / Internet checksum – simple, suitable for small files or control messages.
- LRC – useful when a 2‑D block layout already exists.
- CRC – preferred for Ethernet, USB, or any link prone to burst errors.
- Cryptographic hash – required when strong integrity or authentication is needed.

Implement the chosen method consistently at both the sender and the receiver (or at entry and storage).

Define clear error‑handling behaviour (re‑request, abort, log, raise an alert).

Test the implementation with deliberately corrupted data to confirm detection rates.

7. Exam‑style Practice

Question (AO1 + AO2):

Explain why CRC‑32 is preferred over a simple additive checksum for Ethernet frames. Include in your answer the types of errors each method can detect and the impact on network performance.

Answer outline:

CRC‑32 uses a generator polynomial; it detects all single‑bit errors, all double‑bit errors, any odd number of errors, and any burst error up to 32 bits.

An additive checksum only detects errors that change the total sum; it cannot detect reordered bytes or many burst errors.

Ethernet frames are vulnerable to burst errors caused by electrical noise; CRC‑32 therefore provides far higher reliability.

Although CRC‑32 adds 4 bytes of overhead, modern NICs compute it in hardware, so the impact on throughput is negligible compared with the benefit of reduced frame retransmissions.

8. Suggested Diagram (for teacher’s slide)

Flow diagram: Source → Entry‑time validation → Transmission (with verification bits) → Transfer‑time verification → Destination → Error handling — Typical flow of data with verification steps and error handling.

Describe and use methods of data verification during data entry and data transfer

6.2 Data Integrity – Verification Methods for Data Entry and Data Transfer

Objective

1. Context – Security, Privacy and Integrity

2. Data‑validation Methods (at the point of entry)

Worked examples (pseudo‑code)

2.1 Date format validation (DD/MM/YYYY)

2.2 Range check – age

2.3 Luhn check‑digit (e.g., student ID)

3. Data‑verification Methods (during transfer)

3.1 How the methods work (concise pseudocode)