Understand differences between data and information

Data Processing and Information (Cambridge IGCSE/A‑Level 9626)

1.1 Data and Information

  • Data: Raw, unprocessed facts, figures or symbols. By themselves they have no meaning until placed in a context.
  • Information: Data that has been processed, organised or interpreted so that it becomes meaningful and useful for a specific purpose (e.g., decision‑making).

Why context matters

Data become information only when a clear purpose is defined. The same set of temperature readings, for example, can be used to:

  • adjust a classroom’s heating system (HVAC control);
  • produce a weather forecast for the local newspaper;
  • compare the school’s energy use with the previous year.

The intended purpose determines which processing steps are required and which data are relevant.

Uses of direct and indirect data

  • Direct (primary) data – collected first‑hand for the problem at hand (e.g., questionnaire completed by pupils, sensor log from a laboratory thermometer, interview with a teacher). Used when the exam question asks for data that are specific, up‑to‑date and tailored to the investigation.
  • Indirect (secondary) data – obtained from existing sources (e.g., national census, published exam results, weather‑service API). Used when the question requires large‑scale or historical information that would be impractical to collect yourself.

1.2 Sources of Data

Source type School‑level examples Advantages Disadvantages
Direct (primary) Questionnaire, laboratory sensor log, teacher interview Tailored to the exact need; usually up‑to‑date Time‑consuming, may be costly
Indirect (secondary) National census, published exam results, weather‑service API Readily available; often covers large populations May be outdated, not specific to the problem, or require licences

1.3 Quality of Information

The syllabus requires evaluation of five quality dimensions. In any exam answer you should comment on **all five** for the information you are discussing.

Dimension What it means Illustrative issue
Accuracy Correctness of the data and the resulting information Mis‑reading a sensor gives 99 °C instead of 19 °C.
Relevance Fit for the specific purpose Using national average rainfall to predict a single school’s garden watering needs.
Age (timeliness) How up‑to‑date the data are Relying on a 2011 census for current student‑population forecasts.
Detail (granularity) Level of detail required for the decision Only knowing “pass/fail” when individual marks are needed for remediation.
Completeness All required data are present Missing questionnaire responses leading to biased conclusions.

1.4 Encryption and Security

Data often need protection before they are turned into information, especially when they are personal or confidential.

  • Why encrypt? To prevent unauthorised parties from reading the data while it is stored or transmitted.
  • Symmetric encryption – same key for encrypting and decrypting (e.g., AES). Fast, but key distribution can be a problem.
  • Asymmetric encryption – public‑key pair (e.g., RSA). Solves key‑distribution but is slower.
  • TLS/SSL – protocols that provide encrypted client‑server communication over the Internet (e.g., a web‑based results portal).
  • IPsec – suite of protocols that encrypts IP packets, used for secure VPN connections between school sites.
Method Typical use in a school Advantages Disadvantages
Symmetric (AES) Encrypting a CSV file of grades before uploading to the cloud Very fast; low processing overhead Both sender and receiver must share the secret key securely
Asymmetric (RSA) Sending encrypted email of personal student data to parents Only the recipient’s private key can decrypt; easy key distribution Slower; larger ciphertext
TLS/SSL HTTPS login to the school’s online portal Provides confidentiality, integrity and authentication for web traffic Requires a valid digital certificate and proper configuration
IPsec Secure VPN linking the main campus with a satellite campus Encrypts all IP traffic, transparent to applications More complex to set up; may need specialised hardware

1.5 Checking the Accuracy of Data – Validation & Verification

  • Validation – checks data **before** it is stored, using predefined rules:
    • Presence (no missing fields)
    • Range (e.g., age must be 5–18 for pupils)
    • Type (numeric vs. text)
    • Format (date as DD/MM/YYYY)
    Pros: Prevents impossible or out‑of‑range entries at the point of capture.
  • Verification – checks data **after** entry to confirm it has been recorded correctly:
    • Double entry (two operators input the same data)
    • Parity or checksum for digital files
    • Visual cross‑check with original paper form
    Cons: Adds time and cost but catches transcription errors that validation cannot.

1.6 Data Processing Methods

Information systems may handle data in three main ways. Knowing the method helps you discuss advantages and drawbacks in exam answers.

Method Typical school example Key advantage Key drawback
Batch processing Monthly calculation of school fees from a spreadsheet Efficient for large volumes Results are not available until the batch finishes
Online (transaction) processing Student registers for a club via the school portal Immediate feedback to the user Requires continuous system availability
Real‑time processing Smart‑thermostat adjusts heating based on live temperature sensor data Decisions made instantly Higher hardware and software complexity

1.7 Algorithmic Representation of Processing Methods

For each processing type the syllabus expects a simple algorithm (pseudocode) that shows the main steps.

Batch processing – payroll run

FOR each employee IN employeeList
    READ hoursWorked, hourlyRate
    VALIDATE hoursWorked (0‑200) AND hourlyRate (positive)
    CALCULATE pay = hoursWorked * hourlyRate
    STORE pay in payrollFile
END FOR
SORT payrollFile by employeeID
OUTPUT payrollFile to accounting system

Online (transaction) processing – club registration

WHILE user is logged in
    DISPLAY registration form
    IF user submits form THEN
        VALIDATE all fields (non‑empty, correct format)
        IF validation passes THEN
            INSERT record into ClubRegistrations table
            DISPLAY "Registration successful"
        ELSE
            DISPLAY error messages
        END IF
    END IF
END WHILE

Real‑time processing – smart‑thermostat control loop

LOOP forever
    READ currentTemp FROM temperatureSensor
    VALIDATE currentTemp BETWEEN 10 AND 35
    IF currentTemp < setPoint - 0.5 THEN
        TURN heating ON
    ELSE IF currentTemp > setPoint + 0.5 THEN
        TURN heating OFF
    END IF
    WAIT 5 seconds
END LOOP

1.8 Transformation Process – From Data to Information

The conversion follows a systematic series of steps. Restate the purpose identified in 1.1 before each step.

  1. Collection – Gather raw data from the chosen source(s).
  2. Validation – Apply the validation rules listed in 1.5 (pre‑storage).
  3. Verification – Perform verification checks (post‑entry) to ensure accuracy.
  4. Processing – Apply calculations, sorting, filtering or aggregation. Choose the appropriate method (batch, online or real‑time) as discussed in 1.6.
  5. Interpretation / Contextualisation – Add meaning, compare with standards or benchmarks, and decide how the results will be used.
  6. Presentation – Output in a user‑friendly format (tables, charts, reports, dashboards).

Mathematical illustration

Raw temperature readings (data) from a classroom sensor:

$$\{23.4,\;24.1,\;22.8,\;23.9,\;24.0\}$$

  • Validate each reading is within the plausible range 15 °C–30 °C (all pass).
  • Calculate the average (processing):

$$\bar{T} = \frac{1}{5}\sum_{i=1}^{5} T_i = \frac{23.4 + 24.1 + 22.8 + 23.9 + 24.0}{5}=23.64^\circ\text{C}$$

Interpretation: The classroom is comfortably warm for the current season.

Presentation: Display the average on the school’s HVAC control panel and in a weekly climate‑report chart.

1.9 Practical Implications for Exam Answers

  • Always link data to a clear purpose – this demonstrates understanding of “context”.
  • When discussing a source, state whether it is direct or indirect and weigh its advantages/disadvantages.
  • Evaluate information against **all five** quality dimensions; give a short example of a failure (e.g., outdated census data = low “age”).
  • Identify the appropriate processing method (batch, online, real‑time) and justify its use.
  • Include at least one validation or verification technique when describing a data‑handling scenario, and note the timing (validation = before storage; verification = after entry).
  • Mention security where relevant, naming symmetric/asymmetric encryption, TLS/SSL or IPsec, and comment on their pros/cons.
  • Where the question asks for an algorithm, provide a concise pseudocode that shows the main steps for the chosen processing method.
Suggested diagram: Flowchart – Data → Validation → Verification → Processing (batch/online/real‑time) → Interpretation (context) → Information → Decision‑making.

Create an account or Login to take a Quiz

48 views
0 improvement suggestions

Log in to suggest improvements to this note.