Data are raw facts, figures or symbols that have no meaning on their own. Information is the result of giving data context, meaning and purpose – i.e. data become information when they are processed, interpreted and used to support decision‑making. Both direct and indirect data sources can be used to produce information, but the choice of source influences the quality, cost and timeliness of the final output.
Data are collected straight from the original event, object or person without any intermediate transformation.
| Source | Typical Examples | Advantages | Disadvantages |
|---|---|---|---|
| Questionnaires / Surveys | Online forms, paper‑based surveys | High control over questions; can target specific groups | Respondent bias; time‑consuming to design and analyse |
| Interviews & Focus Groups | Face‑to‑face or video interviews, group discussions | Rich, qualitative insight; ability to probe deeper | Requires skilled interviewer; limited sample size |
| Data‑logging (Sensors) | Temperature probes, motion detectors, GPS trackers | Real‑time, objective measurements; minimal human error | Equipment cost; maintenance and calibration needed |
| Observation | Manual counts of footfall, behavioural observation | Direct view of actual behaviour; no reliance on self‑report | Observer bias; may be intrusive |
| Weather Data | On‑site weather stations, portable anemometers | Accurate, location‑specific information | Limited to physical conditions; equipment dependent |
| Transaction Records (POS) | Sales receipts, checkout logs | Exact, time‑stamped data; useful for trend analysis | May contain errors if system fails; privacy concerns |
| Manual Measurements | Ruler, stopwatch, tape measure | Simple, low‑tech; immediate results | Human error; limited precision |
Data are derived, aggregated or transformed from existing records or publications.
| Source | Typical Examples | Advantages | Disadvantages |
|---|---|---|---|
| Census & Electoral Register | National population counts, voter lists | Comprehensive coverage; highly authoritative | Often outdated; limited to demographic variables |
| Commercial Data Sets | Market‑research databases, credit‑card transaction aggregates | Large volume; ready‑made for analysis | Costly licences; may not match specific research needs |
| Statistical Reports (Government) | Labour‑market statistics, health‑service reports | Standardised methodology; trustworthy source | Published at set intervals; limited timeliness |
| Research Publications & Articles | Peer‑reviewed journals, conference papers | Validated findings; detailed methodology | May be behind paywalls; context may differ |
| Web‑Analytics Summaries | Average session duration, bounce rate from Google Analytics | Instant access; useful for digital performance | Aggregated – loss of raw detail; dependent on tracking setup |
| Historical Archives | Old survey data, previous research datasets | Provides longitudinal perspective | Potential incompatibility with current formats; may be incomplete |
When evaluating any data source, consider the five criteria below. A single data set can fail on one or more of these, affecting its usefulness.
Example: A 2020 national census provides very accurate population counts (high accuracy) but may be unsuitable for a 2025 market‑trend study because its age is poor.
| Aspect | Direct Source | Indirect Source |
|---|---|---|
| Origin of data | Collected at the point of occurrence | Derived from existing records or reports |
| Control over collection | High – researcher designs the method | Low – depends on how the original data were gathered |
| Timeliness | Often real‑time or current | May be outdated or historical |
| Cost | Usually higher (fieldwork, equipment) | Generally lower (reuse of existing data) |
| Potential for bias | Depends on design and execution | May inherit biases from the original source |
Encryption protects data from unauthorised access, especially when data travel over insecure networks or are stored on shared media.
| Topic | Key Points |
|---|---|
| Why encrypt? | Confidentiality, integrity, compliance with data‑protection laws (e.g., GDPR). |
| Symmetric encryption | Same secret key for encryption and decryption (e.g., AES). Fast, but key distribution is a challenge. |
| Asymmetric encryption | Public‑key pair (e.g., RSA). Enables secure key exchange; slower than symmetric. |
| Encryption protocols | TLS/SSL – encrypted channel for web traffic (HTTPS). Uses asymmetric exchange to agree on a symmetric session key. IPsec – encrypts IP packets at the network layer; used for VPNs. |
| Key management | Generation, safe storage, rotation and revocation of keys are essential; poor key management nullifies the benefit of encryption. |
| Pros / Cons |
Pros: Data remain confidential, tamper‑evident, meet legal requirements. Cons: Processing overhead, key‑management complexity, possible interoperability issues. |
| Aspect | Validation | Verification |
|---|---|---|
| Purpose | Ensures relevance and logical consistency of the data for the intended use. | Ensures the data have been entered or transferred correctly. |
| Typical Methods | Range checks, type checks, length checks, format checks, check‑digit, lookup tables, consistency checks, business‑rule limits. | Checksums, audit trails, double‑entry comparison, hash verification. |
| Advantages | Reduces inappropriate or nonsensical data before analysis. | Detects transcription or transmission errors. |
| Disadvantages | May reject legitimate out‑liers if rules are too strict. | Can be time‑consuming; may require extra resources. |
| Check | What it ensures |
|---|---|
| Presence (mandatory) | Field is not left blank. |
| Range | Value lies between defined minimum and maximum. |
| Type | Data are of the correct kind (numeric, text, date). |
| Length | Number of characters/digits is within limits. |
| Format | Structure matches a pattern (e.g., postcode, email). |
| Check‑digit | Mathematical test (e.g., ISBN) confirms integrity. |
| Lookup | Value exists in a predefined list (e.g., country codes). |
| Consistency | Related fields agree (e.g., start date ≤ end date). |
| Limit (business rule) | Data obey organisational constraints (e.g., stock cannot be negative). |
| Criterion | Batch | Online | Real‑time |
|---|---|---|---|
| Typical Use‑case | Large, non‑time‑critical jobs (payroll, end‑of‑day reports) | Interactive transactions (banking, retail POS) | Safety‑critical or time‑sensitive systems (control, monitoring) |
| Data latency | Hours or days | Seconds | Milliseconds / microseconds |
| System load | High during scheduled run, low otherwise | Steady moderate load | Continuous high load; requires robust hardware |
| Advantages | Efficient for huge data sets; simple error handling. | Immediate feedback; supports concurrent users. | Meets strict timing requirements; enables automatic control. |
| Disadvantages | Delayed results; errors discovered late. | Higher resource consumption; more complex concurrency control. | Complex design; expensive hardware; stringent testing. |
Batch processing
FOR each input file IN scheduled_folder
READ all records
VALIDATE each record
TRANSFORM as required
APPEND to master_output
END FOR
SAVE master_output
Online (transaction‑oriented) processing
WHILE system is running
WAIT for transaction request
IF request received THEN
VALIDATE request
UPDATE database
RETURN confirmation to user
END IF
END WHILE
Real‑time processing
LOOP every 10 ms
CAPTURE sensor input
IF input violates safety limit THEN
TRIGGER alarm / corrective action
END IF
LOG event
END LOOP
A retail chain wants to understand weekly sales trends.
For precise internal analysis, the direct POS data are preferred. For benchmarking against competitors, the indirect market report adds valuable context.
The transformation of raw data D into useful information I can be expressed as:
$$ I = f(D) $$
where f represents the set of processing operations (sorting, filtering, aggregation, validation, encryption, etc.).
Create an account or Login to take a Quiz
Log in to suggest improvements to this note.
Your generous donation helps us continue providing free Cambridge IGCSE & A-Level resources, past papers, syllabus notes, revision questions, and high-quality online tutoring to students across Kenya.