Data processing and information
1. Data, information and sources
- Data – raw facts, figures or symbols that have not yet been interpreted.
- Information – data that have been processed, organised or presented so that they are useful for decision‑making.
Sources of data
| Source type | Examples | When to use |
| Direct |
Questionnaire, interview, sensor, on‑line form |
When the most up‑to‑date, specific data are required. |
| Indirect |
Census tables, published statistics, third‑party databases |
When cost or time constraints make primary collection impractical. |
2. Quality of information
| Criterion | What it means | Effect on verification |
| Accuracy | Correctness of the data | Needs rigorous checks (double entry, checksums, control totals). |
| Relevance | Fit for the intended purpose | Irrelevant fields can be omitted early. |
| Age | How up‑to‑date the data are | Older data may require re‑validation or updating. |
| Detail | Level of granularity required | More fields → more validation rules. |
| Completeness | All required items are present | Control totals, mandatory‑field checks. |
3. Encryption (brief)
- Symmetric – same key encrypts and decrypts (e.g., AES). Fast, good for bulk data.
- Asymmetric – public key encrypts, private key decrypts (e.g., RSA). Used for key exchange and digital signatures.
Encryption is usually combined with an integrity check (MAC, checksum, hash) because encryption alone does not detect accidental or malicious corruption.
4. Checking the accuracy of data – validation & verification
4.1 Validation checks (required by the syllabus)
| Check type | Purpose | Simple example |
| Presence | Field must not be blank | Customer name entered. |
| Range | Value must lie between two limits | Age 0 – 120. |
| Type | Correct data kind | Numeric field cannot contain letters. |
| Length | Exact number of characters/bytes | Postcode exactly 6 characters. |
| Format | Specific pattern | DD/MM/YYYY for dates. |
| Check‑digit | Mathematical digit that validates a number | ISBN‑13 uses modulo‑10. |
| Lookup | Value must exist in a reference list | Country code found in ISO‑3166. |
| Consistency | Related fields must agree | Start‑date earlier than end‑date. |
| Limit | Maximum number of records allowed | No more than 500 items per order. |
| Control total / Hash total | Sum or hash of a column to detect missing/extra rows | Total of invoice amounts = £12 345.67. |
| Checksum | Simple arithmetic total of a data block | 8‑bit sum modulo 256 for a packet. |
| Parity check | Detects single‑bit errors in binary data | Even parity added to each transmitted byte. |
4.2 Verification methods (expanded)
Verification is the process of confirming that data entered into a system are exactly the same as the original source. The Cambridge syllabus expects students to be able to describe the following three methods in detail.
- Visual checking – the operator manually compares each entered item with the original document. Cheap, but only suitable for small, low‑risk data sets.
- Double entry – two independent operators input the same source data. The system automatically flags any mismatched fields. Provides a very high level of confidence for high‑value data (e.g., financial statements, census).
- Parity check – a single parity bit is added to a group of *n* bits so that the total number of 1’s is even (even parity) or odd (odd parity).
p = ( Σi=1n bi ) mod 2
Example (even parity, n = 8): data = 10110010 (four 1’s) → p = 0 → transmitted block = 10110010 0.
Detects any single‑bit error; cannot correct the error.
4.3 Quick checklist for the exam
- Visual checking – manual, low cost, limited scalability.
- Double entry – two operators, automatic mismatch detection, high reliability, time‑consuming.
- Parity check – hardware‑implemented, detects single‑bit errors, cannot detect two‑bit or burst errors.
- Checksum – adds all bytes (or words) and sends the total; catches many error patterns.
- Hash total – uses a cryptographic hash (MD5, SHA‑1, SHA‑256) to produce a fixed‑size “finger‑print”.
- Control total – sum of a numeric field (e.g., total quantity) compared with a pre‑calculated value.
- Check‑digit – calculated from other digits (mod‑10, Luhn) to verify numbers such as credit‑card or ISBN.
5. Data processing modes
| Mode | Description | Typical examples | Pros / Cons |
| Batch processing |
Data are collected, stored and processed together at a later time. |
Payroll, end‑of‑day sales reports. |
Efficient for large volumes; results are not immediate. |
| Online (transaction‑oriented) processing |
Data are processed immediately as they are entered. |
Online banking, ticket booking. |
Fast feedback; requires continuous system availability. |
| Real‑time processing |
Data must be processed within a strict time limit. |
Air‑traffic control, industrial control systems. |
Critical for safety; higher hardware/software cost. |
Topic 2 – Hardware & Software (Cambridge 9626)
2.1 Types of hardware
| Hardware type | Examples | Pros | Cons |
| Mainframe |
IBM Z series, UNIVAC |
Very high processing power, massive storage, reliable. |
Expensive, specialised staff required. |
| Mini‑computer / Mid‑range |
DEC VAX, HP 3000 |
Cheaper than mainframes, still multi‑user. |
Less scalable than mainframes. |
| Micro‑computer (PC) |
Desktop, laptop, tablet |
Low cost, widely available, easy to upgrade. |
Limited processing for very large data sets. |
2.2 Types of software
| Software category | Examples | Pros | Cons |
| System software |
Operating systems (Windows, Linux), device drivers |
Controls hardware, provides platform for applications. |
Complex; bugs can affect the whole system. |
| Utility software |
Antivirus, backup tools, disk defragmenter |
Supports system maintenance, improves security. |
May consume resources; occasional false positives. |
| Custom (in‑house) software |
Company‑specific inventory system |
Tailored exactly to business needs. |
Higher development cost; requires ongoing support. |
| Off‑the‑shelf software |
Microsoft Office, QuickBooks |
Ready to use, lower cost, often well‑documented. |
May not fit all requirements; licensing restrictions. |
2.3 User‑interface types
| UI type | Characteristics | Typical use |
| Command‑line | Text‑based commands; fast for expert users. | System administration, programming. |
| Menu‑driven | Hierarchical menus; reduces need to remember commands. | Retail POS, ATMs. |
| Form‑based | Fields for data entry; good for structured input. | Online applications, surveys. |
| Graphical (GUI) | Icons, windows, mouse interaction. | Desktop applications, spreadsheets. |
Topic 3 – Monitoring & Control
3.1 Sensors and transducers
- Temperature (thermocouple, RTD)
- Pressure (piezo‑electric, strain‑gauge)
- Proximity (inductive, ultrasonic)
- Light (photodiode, LDR)
- Motion (accelerometer, PIR)
3.2 Control technologies
| Technology | How it works | Typical application |
| On‑off control | Device is either fully on or fully off. | Heaters, simple alarms. |
| Proportional control | Output varies in proportion to the error signal. | Motor speed regulation. |
| PID control | Combines Proportional, Integral and Derivative actions for precise regulation. | Industrial temperature control. |
3.3 Calibration methods
| Method | Procedure | When to use |
| One‑point calibration |
Adjust sensor so that a single known reference value reads correctly. |
When the sensor’s response is linear and the operating range is narrow. |
| Two‑point calibration |
Set the sensor at two known values (low & high) and adjust slope and offset. |
Standard for most temperature, pressure and voltage sensors. |
| Multi‑point calibration |
Measure several points across the full range and fit a curve. |
Required for non‑linear devices or when high accuracy over a wide range is needed. |
Topic 4 – Algorithms & Flowcharts
4.1 Required flowchart symbols (Cambridge approved)
| Symbol | Name | Purpose |
| ◯ | Terminator (Start/End) | Marks the beginning and end of the process. |
| ▭ | Process | Indicates an operation or instruction. |
| ◆ | Decision | Shows a yes/no (true/false) test. |
| ⇆ | Connector | Links separate parts of a large flowchart. |
| ▱ | Input/Output (Parallelogram) | Data entry or display. |
4.2 Common flowchart errors (quick checklist)
- Missing terminator symbols.
- Decision diamonds without two clearly labelled arrows.
- Arrows that cross without a connector.
- Using the wrong shape for an operation (e.g., a rectangle for input).
- Unclear start‑point – the flow must begin with a single “Start”.
4.3 Example verification routine (pseudo‑code)
FOR each record R in input_file
error_flag ← FALSE
/* 1. Presence checks */
IF R.name = "" OR R.id = "" THEN
error_flag ← TRUE
/* 2. Range checks */
IF R.age < 0 OR R.age > 120 THEN
error_flag ← TRUE
/* 3. Check‑digit (Luhn) */
IF NOT valid_check_digit(R.account_number) THEN
error_flag ← TRUE
/* 4. Parity check for binary field */
IF parity(R.binary_field) ≠ EXPECTED_PARITY THEN
error_flag ← TRUE
/* 5. Update control total */
total_amount ← total_amount + R.amount
IF error_flag = TRUE THEN
WRITE R TO error_log
ELSE
WRITE R TO good_file
END IF
END FOR
/* Final control‑total comparison */
IF total_amount ≠ expected_total THEN
WRITE "Control total mismatch" TO error_log
END IF
4.4 Flowchart for the routine (textual description)
- Start (terminator).
- Read next record – Input symbol.
- Decision: Presence check – if No → Write to error log.
- Decision: Range check – if No → Write to error log.
- Decision: Check‑digit – if No → Write to error log.
- Decision: Parity check – if No → Write to error log.
- Process: Add amount to control total.
- Output: Write record to good file (if all checks passed).
- Connector back to step 2 until end of file.
- Decision: Does calculated total = expected total?
- Output: Write “Control total mismatch” if required.
- End (terminator).
Topic 5 – eSecurity
5.1 Personal data & malware
- Personal data – any information that can identify a living individual (name, DOB, ID number, etc.). Must be stored securely and processed lawfully.
- Malware – software designed to damage, disrupt or gain unauthorised access (viruses, worms, trojans, ransomware).
5.2 Prevention methods – software vs. physical
| Prevention method | How it works | Advantages | Disadvantages |
| Antivirus / anti‑malware software |
Scans files, monitors behaviour, updates signatures. |
Automatic, updates regularly, protects many devices. |
May miss zero‑day threats; consumes resources. |
| Firewalls (software or hardware) |
Filters incoming/outgoing network traffic based on rules. |
Blocks unauthorised connections, can be centrally managed. |
Improper configuration can block legitimate traffic. |
| Air‑gapped systems |
Physical separation from any network. |
Virtually eliminates remote malware infection. |
Inconvenient for data sharing; costly to maintain. |
| Hardware security tokens (e.g., smart cards) |
Store cryptographic keys; required for login. |
Strong two‑factor authentication. |
Lost or damaged tokens can lock users out. |
Topic 6 – Digital Divide
6.1 Definition, causes and effects
- Definition: The gap between individuals, households or regions that have access to modern information and communication technologies (ICT) and those that do not.
- Causes: Cost of devices, lack of infrastructure, low digital literacy, geographic isolation, socioeconomic factors.
- Effects: Unequal educational and employment opportunities, reduced civic participation, widening economic disparity.
6.2 Groups most affected
- Rural communities
- Low‑income families
- Older adults
- People with disabilities
6.3 Mitigation strategies (exam‑level bullet list)
- Government‑funded broadband expansion programmes.
- Community ICT centres and public‑access computers.
- Subsidised devices or “bring‑your‑own‑device” schemes for schools.
- Digital‑literacy training courses for adults and seniors.
- Accessible design standards for software and websites.
Topic 7 – Expert Systems
7.1 Core components
- Knowledge base – collection of facts and rules (IF‑THEN statements).
- Inference engine – applies rules to the facts to draw conclusions.
- User interface – allows the user to input data and receive advice.
- Explanation facility – tells the user why a conclusion was reached.
7.2 Reasoning styles
| Style | Direction of reasoning | Typical use |
| Forward chaining (data‑driven) | Starts with known facts, applies rules, moves forward to a conclusion. | Diagnostic systems – e.g., “symptom → disease”. |
| Backward chaining (goal‑driven) | Starts with a goal, works backwards to see if required facts exist. | Troubleshooting – e.g., “Is the printer jammed? → check paper feed”. |
7.3 Advantages and disadvantages
| Advantage | Disadvantage |
| Provides consistent, expert‑level advice. | Knowledge acquisition can be time‑consuming. |
| Can handle large rule sets quickly. | May not cope well with ambiguous or incomplete data. |
| Explanation facility aids learning. | Maintenance required when domain knowledge changes. |
Topic 8 – Spreadsheets
8.1 Creating a spreadsheet
- Plan the layout – decide on rows (records) and columns (fields).
- Enter data – use appropriate data types (text, numbers, dates).
- Apply formulas – e.g.,
=SUM(B2:B20), =AVERAGE(C2:C20).
- Use absolute references (
$A$1) when a constant is needed.
- Format cells – number formats, conditional formatting for alerts.
8.2 Testing and validation
- Check for missing or duplicate entries (use
COUNTIF).
- Validate ranges with
Data → Data Validation (e.g., 0–100%).
- Use
IFERROR to trap calculation errors.
8.3 Using charts
- Column chart – compare quantities (e.g., sales per month).
- Line chart – show trends over time.
- Pie chart – illustrate parts of a whole (budget percentages).
- Scatter plot – display relationships between two variables.
8.4 Case brief – AO2 (design a solution)
Brief: A school wants a budgeting spreadsheet for its annual sports day. The spreadsheet must record each expense (item, quantity, unit cost), calculate total cost, compare it with a budget limit (£2 500) and highlight any overspend.
Required features:
- Input table with columns: Item, Quantity, Unit Cost, Sub‑total (
=B2*C2).
- Grand total using
=SUM(D2:D30).
- Cell showing “Within budget” or “Over budget” using
=IF(E31<=2500,"Within budget","Over budget").
- Conditional formatting to colour the total cell red when over budget.
- A simple bar chart showing expense categories.
8.5 Limitations of spreadsheet models
- Scalability – performance degrades with very large data sets.
- Risk of hidden errors – formula mistakes can propagate unnoticed.
- Version control – multiple copies can lead to inconsistent data.
- Limited data‑validation compared with dedicated database systems.
Topic 9 – Modelling with Spreadsheets
9.1 What‑if analysis
- Change a single input (e.g., price) and observe the effect on profit.
- Use
Data → What‑If → Scenario Manager for multiple scenarios.
9.2 Goal‑Seek
Goal‑Seek finds the input value that produces a desired result. Example: “What sales figure is needed to achieve a profit of £5 000?” – set the profit cell as the “Set cell”, £5 000 as the “To value”, and the sales figure cell as the “By changing cell”.
9.3 Simulation (Monte Carlo)
- Generate random inputs (e.g., demand) using
=RANDBETWEEN or =NORMINV(RAND(),mean,sd).
- Run the model many times (via Data Table) to obtain a distribution of outcomes.
- Analyse results with histograms or summary statistics.
9.4 Model validation and limitations (recap)
- Validate by checking against known data or using control totals.
- Document assumptions (e.g., linear cost behaviour).
- Recognise that models simplify reality – they cannot predict unexpected events or complex interactions.
9.5 Key points to remember (overall)
- Data → information conversion must be supported by both validation (type, range, format, etc.) and verification (visual checking, double entry, parity, checksum, control totals, check‑digits).
- Select verification methods that match data volume, criticality and available resources.
- Document any verification strategy in an algorithm and illustrate it with a correctly‑symbolised flowchart.
- Hardware, software and UI choices each have distinct pros and cons – justify the most appropriate for the task.
- Monitoring & control systems require suitable sensors, calibration and a clear control technique (on‑off, proportional, PID).
- Expert systems use forward or backward chaining; understand when each is appropriate.
- Spreadsheets are powerful for calculation, visualisation and simple modelling, but be aware of their limitations and test thoroughly.
- Address the digital divide by promoting access, infrastructure and digital‑literacy programmes.