Cambridge Syllabus Notes

Data, Information and Processing Methods

1.1 Data & Information

Data are raw, unprocessed facts or measurements. Information is data that have been organised, processed or interpreted so that they become meaningful to a user.

Example: A smart‑meter records the numbers 578, 582, 590. These are data. When the figures are summed and presented as “Your electricity consumption this month is 1 750 kWh”, the result is information.

Sources of data

Source type	Direct examples	Indirect examples
Surveys / questionnaires	Customer‑satisfaction forms filled in‑person	National census data published by the statistics office
Sensors / instruments	Temperature‑sensor readings from a factory line	Web‑scraped price lists from competitor websites
Transactions	Point‑of‑sale receipts captured at checkout	Historical financial statements downloaded from a public register

Direct vs. indirect data (Cambridge requirement)

Direct data are collected first‑hand for the specific purpose of the system (e.g., meter readings uploaded by the utility).
Advantages: usually more accurate, timely and under the control of the organisation.
Disadvantages: may be costly to obtain and can raise privacy concerns.
Indirect data are obtained from secondary sources that were created for another purpose (e.g., census data, web‑scraped information).
Advantages: inexpensive, readily available, often large‑scale.
Disadvantages: may be outdated, less relevant, and can contain bias.

Ethical / legal considerations

Consent – individuals must agree to the collection of personal data.
Data protection legislation – GDPR, Data Protection Act, etc.
Confidentiality – sensitive data must be stored and transmitted securely.

1.2 Quality of Information

Information must meet five quality criteria. The table shows each criterion, how it can be assessed and a brief illustration.

Criterion	How to assess	Illustration
Accuracy	Cross‑check with a trusted source; use validation rules.	A mis‑read meter gives a bill £20 too high.
Relevance	Ask whether the information answers the business question.	Using last year’s weather data to predict today’s flood risk is irrelevant.
Age (Timeliness)	Check timestamps or version numbers; define acceptable age limits.	Out‑of‑date stock levels cause a shop to sell items it no longer has.
Completeness	Verify that all mandatory fields are present; use “presence” validation.	Missing overtime entries lead to under‑payment of staff.
Consistency	Ensure uniform formats, units and coding across the whole data set.	One record uses “USD”, another “$”; this can cause calculation errors.

Data provenance – recording where each data item originated, who collected it and when – helps to judge all five quality criteria.

1.3 Encryption – need, methods and protocols

Encryption protects data from unauthorised access by converting it into a coded form. It is essential when data are stored (at rest) or transmitted (in transit) over insecure networks.

Methods

Symmetric encryption – one secret key is used for both encryption and decryption (e.g., AES). Fast, but the key must be shared securely.
Asymmetric encryption – a public key encrypts, a private key decrypts (e.g., RSA). Solves the key‑distribution problem but is slower.

Key‑management practices (syllabus point)

Key generation using a trusted random‑number source.
Secure storage (hardware security modules, encrypted key vaults).
Regular key rotation and revocation procedures.
Backup of keys in an offline, tamper‑proof location.

Common protocols

Protocol	Purpose	Typical use
TLS / SSL	Secure web traffic (HTTPS)	Online banking, e‑commerce
IPsec	Secure IP‑level communication	Virtual private networks (VPNs)

Advantages & disadvantages

Advantage	Disadvantage
Confidentiality – data cannot be read without the key	Performance overhead – encryption/decryption consumes CPU cycles
Integrity – tampering can be detected (e.g., with MACs)	Key‑management complexity, especially for symmetric keys

Figure 1 – Simplified TLS handshake (placeholder for diagram)

1.4 Validation & Verification – methods and purpose

Both techniques ensure that data are correct before they are processed, but they address different stages.

Validation (checking input data)

Check	Description
Presence	No mandatory field left blank.
Range	Values fall within acceptable limits (e.g., 0 ≤ hours ≤ 200).
Type	Numeric, text, date, etc.
Format	Correct number of digits, separators, or pattern (e.g., “A‑123”).
Consistency	Related fields agree (e.g., start ≤ end date).

Verification (checking output data)

Check	Description
Double‑entry	Two independent entries give the same result.
Checksum / hash	Totals match expected values (e.g., sum of gross pay).
Re‑calculation	Sample records are processed manually and compared.

Mini‑exercise (payroll CSV validation)

Open the supplied payroll.csv file.
Validate each row for:
- Employee ID – numeric and present.
- Hours worked – numeric, 0 ≤ value.
- Pay‑rate – positive number.
- Tax code – matches pattern “A‑123”.
Verify by manually calculating the total gross pay for the first five rows and comparing with the program’s output.

Sample test‑plan template (useful for the practical task)

Test ID | Description                     | Input data                | Expected result
-------|---------------------------------|---------------------------|------------------------------
V01    | Check mandatory fields present  | Row 12 of payroll.csv     | No blank fields
V02    | Verify numeric range for hours   | Hours = ‑5               | Validation error
V03    | Confirm tax‑code format          | TaxCode = “B‑987”        | Pass
V04    | Verify checksum of gross totals  | Whole file                | Sum = 1 254 320.00

1.5 Processing Methods – batch, online and real‑time

Definitions

Batch processing – jobs are collected and run together without user interaction during execution.
Online (interactive) processing – data are entered and processed immediately while the user is present.
Real‑time processing – data are processed within a strict time limit (often milliseconds) as they arrive.

Advantages & disadvantages (summary)

Method	Typical response time	Key advantage	Key limitation
Batch	Minutes‑to‑hours (after run starts)	High throughput, low per‑transaction cost	No immediate feedback to user
Online	Seconds	Instant confirmation to user	Higher resource demand per transaction
Real‑time	Milliseconds or less	Deterministic, time‑critical response	Complex, costly hardware/software

Algorithm for selecting a processing method

The Cambridge syllabus requires an algorithm (or flowchart) that decides which method to use based on business requirements.

Algorithm ChooseProcessingMethod
Input: dataVolume, requiredResponse, resourceBudget
Output: processingMethod

1. IF requiredResponse ≤ 0.1 seconds THEN
       processingMethod ← “Real‑time”
   ELSE IF requiredResponse ≤ 5 seconds THEN
       processingMethod ← “Online”
   ELSE
       // response time is not critical
       IF dataVolume > 10 000 records AND resourceBudget is limited THEN
            processingMethod ← “Batch”
       ELSE
            processingMethod ← “Online”   // small volume, interactive preferred
       END IF
   END IF
2. RETURN processingMethod

Link to system life‑cycle stages (syllabus point)

Analysis & design – decide which processing method best meets the functional and non‑functional requirements.
Implementation – develop the chosen method (e.g., write batch scripts, build web services, or program interrupt‑driven real‑time modules).
Testing – validate that the method meets response‑time and throughput criteria.
Deployment & maintenance – monitor performance; switch methods if business needs change.

Batch‑Processing Applications

What is batch processing?

A batch is a collection of similar jobs that are processed together as a single unit, without manual intervention during execution. The typical batch cycle consists of four stages:

Data collection – gather all required inputs.
Processing – apply rules, calculations or transformations.
Output generation – produce reports, files, statements, etc.
Distribution / archiving – send results to users or other systems and store for future reference.

Typical batch‑processing applications

Utility billing (electricity, water, gas)
Payroll for organisations
Bank statement generation
Stock‑taking and inventory updates
End‑of‑day financial reports

Utility Bills

Utility companies must produce millions of bills each month. Batch processing guarantees consistency, speed and cost‑effectiveness.

Data collection – read consumption data from smart meters, manual uploads or legacy databases.
Rate application – apply tariff tables, discounts, subsidies and statutory taxes.
Bill generation – create itemised statements (PDF, XML, or printed form) including consumption, charges and total due.
Distribution – email, portal upload, or postal dispatch; update the customer‑account database with payment status.
Archiving & audit – store each bill for the legally required retention period and generate audit logs.

Why batch? The data set is huge, the calculation rules are fixed, and the output is needed at a set time each month, making a single, automated run the most efficient solution.

Payroll

Payroll systems calculate earnings, deductions and taxes for every employee in an organisation.

Gather input data – attendance logs, overtime sheets, leave records, bonuses, commission entries.
Calculate earnings – apply salary scales, overtime rates, shift differentials, commission formulas.
Apply deductions – tax, national insurance, pension contributions, union fees, garnishments.
Produce outputs – payslips (digital or printed), journal entries for the finance ledger, and a batch file for bank transfers (e.g., BACS, ACH).
Archive & audit – store payroll records for statutory retention periods and generate compliance reports.

Why batch? Payroll must be processed for all staff at the same cut‑off date, guaranteeing uniformity, legal compliance and a clear audit trail.

Comparison of Batch‑Processing Applications

Application	Processing frequency	Typical data volume	Common software	Key processing steps
Utility Bills	Monthly	Hundreds of thousands to millions of records	Billing suites (e.g., SAP IS‑U, Oracle Utilities, IBM InfoSphere)	Data collection → Rate application → Bill generation → Distribution → Archiving
Payroll	Weekly, bi‑weekly or monthly	Dozens to tens of thousands of employee records	Payroll packages (e.g., ADP, Sage Payroll, SAP HR, QuickBooks Payroll)	Attendance capture → Earnings calculation → Deductions → Payslip & bank file → Archive

Figure 2 – Flowchart of the batch‑processing cycle (Data collection → Processing → Output generation → Distribution/archiving). Use this as a template for exam diagrams.

Describe batch processing applications (utility bills, payroll)