Identify direct and indirect data sources

Data Processing and Information

1. Data and Information

Data are raw facts, figures or symbols that have no meaning on their own. Information is the result of giving data context, meaning and purpose – i.e. data become information when they are processed, interpreted and used to support decision‑making. Both direct and indirect data sources can be used to produce information, but the choice of source influences the quality, cost and timeliness of the final output.

2. Sources of Data

2.1 Direct Data Sources

Data are collected straight from the original event, object or person without any intermediate transformation.

Source Typical Examples Advantages Disadvantages
Questionnaires / Surveys Online forms, paper‑based surveys High control over questions; can target specific groups Respondent bias; time‑consuming to design and analyse
Interviews & Focus Groups Face‑to‑face or video interviews, group discussions Rich, qualitative insight; ability to probe deeper Requires skilled interviewer; limited sample size
Data‑logging (Sensors) Temperature probes, motion detectors, GPS trackers Real‑time, objective measurements; minimal human error Equipment cost; maintenance and calibration needed
Observation Manual counts of footfall, behavioural observation Direct view of actual behaviour; no reliance on self‑report Observer bias; may be intrusive
Weather Data On‑site weather stations, portable anemometers Accurate, location‑specific information Limited to physical conditions; equipment dependent
Transaction Records (POS) Sales receipts, checkout logs Exact, time‑stamped data; useful for trend analysis May contain errors if system fails; privacy concerns
Manual Measurements Ruler, stopwatch, tape measure Simple, low‑tech; immediate results Human error; limited precision

2.2 Indirect Data Sources

Data are derived, aggregated or transformed from existing records or publications.

Source Typical Examples Advantages Disadvantages
Census & Electoral Register National population counts, voter lists Comprehensive coverage; highly authoritative Often outdated; limited to demographic variables
Commercial Data Sets Market‑research databases, credit‑card transaction aggregates Large volume; ready‑made for analysis Costly licences; may not match specific research needs
Statistical Reports (Government) Labour‑market statistics, health‑service reports Standardised methodology; trustworthy source Published at set intervals; limited timeliness
Research Publications & Articles Peer‑reviewed journals, conference papers Validated findings; detailed methodology May be behind paywalls; context may differ
Web‑Analytics Summaries Average session duration, bounce rate from Google Analytics Instant access; useful for digital performance Aggregated – loss of raw detail; dependent on tracking setup
Historical Archives Old survey data, previous research datasets Provides longitudinal perspective Potential incompatibility with current formats; may be incomplete

3. Quality of Information

When evaluating any data source, consider the five criteria below. A single data set can fail on one or more of these, affecting its usefulness.

  • Accuracy – How close the data are to the true value.
  • Relevance – Whether the data answer the specific question.
  • Age (Timeliness) – How up‑to‑date the data are.
  • Level of Detail – The granularity required for the analysis.
  • Completeness – Extent to which all needed data are present.

Example: A 2020 national census provides very accurate population counts (high accuracy) but may be unsuitable for a 2025 market‑trend study because its age is poor.

4. Choosing the Appropriate Data Source

  1. Define the information requirement clearly (what, why, for whom).
  2. Decide whether real‑time, recent or historical data are needed.
  3. Check the feasibility of obtaining a direct source (e.g., can you run a questionnaire or install a sensor?).
  4. Assess the reliability, relevance and quality of any indirect source.
  5. Consider ethical, legal and cost implications (e.g., consent, GDPR, licence fees).
  6. Select the source that best balances accuracy, timeliness, cost and practicality.

Comparison of Direct and Indirect Data Sources

Aspect Direct Source Indirect Source
Origin of data Collected at the point of occurrence Derived from existing records or reports
Control over collection High – researcher designs the method Low – depends on how the original data were gathered
Timeliness Often real‑time or current May be outdated or historical
Cost Usually higher (fieldwork, equipment) Generally lower (reuse of existing data)
Potential for bias Depends on design and execution May inherit biases from the original source

5. Encryption of Data

Encryption protects data from unauthorised access, especially when data travel over insecure networks or are stored on shared media.

Topic Key Points
Why encrypt? Confidentiality, integrity, compliance with data‑protection laws (e.g., GDPR).
Symmetric encryption Same secret key for encryption and decryption (e.g., AES). Fast, but key distribution is a challenge.
Asymmetric encryption Public‑key pair (e.g., RSA). Enables secure key exchange; slower than symmetric.
Encryption protocols TLS/SSL – encrypted channel for web traffic (HTTPS). Uses asymmetric exchange to agree on a symmetric session key.
IPsec – encrypts IP packets at the network layer; used for VPNs.
Key management Generation, safe storage, rotation and revocation of keys are essential; poor key management nullifies the benefit of encryption.
Pros / Cons Pros: Data remain confidential, tamper‑evident, meet legal requirements.
Cons: Processing overhead, key‑management complexity, possible interoperability issues.

6. Checking the Accuracy of Data

6.1 Validation vs. Verification

Aspect Validation Verification
Purpose Ensures relevance and logical consistency of the data for the intended use. Ensures the data have been entered or transferred correctly.
Typical Methods Range checks, type checks, length checks, format checks, check‑digit, lookup tables, consistency checks, business‑rule limits. Checksums, audit trails, double‑entry comparison, hash verification.
Advantages Reduces inappropriate or nonsensical data before analysis. Detects transcription or transmission errors.
Disadvantages May reject legitimate out‑liers if rules are too strict. Can be time‑consuming; may require extra resources.

6.2 Validation Checklist (Syllabus terminology)

Check What it ensures
Presence (mandatory) Field is not left blank.
Range Value lies between defined minimum and maximum.
Type Data are of the correct kind (numeric, text, date).
Length Number of characters/digits is within limits.
Format Structure matches a pattern (e.g., postcode, email).
Check‑digit Mathematical test (e.g., ISBN) confirms integrity.
Lookup Value exists in a predefined list (e.g., country codes).
Consistency Related fields agree (e.g., start date ≤ end date).
Limit (business rule) Data obey organisational constraints (e.g., stock cannot be negative).

7. Data‑Processing Methods

7.1 Overview

  • Batch processing – large volumes handled at scheduled times (e.g., nightly payroll).
  • Online (transaction‑oriented) processing – each transaction is processed immediately (e.g., ATM withdrawals).
  • Real‑time processing – results must be produced within a strict time limit (e.g., air‑traffic control).

7.2 Comparative Matrix

Criterion Batch Online Real‑time
Typical Use‑case Large, non‑time‑critical jobs (payroll, end‑of‑day reports) Interactive transactions (banking, retail POS) Safety‑critical or time‑sensitive systems (control, monitoring)
Data latency Hours or days Seconds Milliseconds / microseconds
System load High during scheduled run, low otherwise Steady moderate load Continuous high load; requires robust hardware
Advantages Efficient for huge data sets; simple error handling. Immediate feedback; supports concurrent users. Meets strict timing requirements; enables automatic control.
Disadvantages Delayed results; errors discovered late. Higher resource consumption; more complex concurrency control. Complex design; expensive hardware; stringent testing.

7.3 Simple Algorithms (pseudocode)

Batch processing

FOR each input file IN scheduled_folder
    READ all records
    VALIDATE each record
    TRANSFORM as required
    APPEND to master_output
END FOR
SAVE master_output

Online (transaction‑oriented) processing

WHILE system is running
    WAIT for transaction request
    IF request received THEN
        VALIDATE request
        UPDATE database
        RETURN confirmation to user
    END IF
END WHILE

Real‑time processing

LOOP every 10 ms
    CAPTURE sensor input
    IF input violates safety limit THEN
        TRIGGER alarm / corrective action
    END IF
    LOG event
END LOOP

8. Illustrative Example

A retail chain wants to understand weekly sales trends.

  • Direct source: Export daily sales figures from each store’s POS system and aggregate them in a spreadsheet.
  • Indirect source: Use a published market‑research report that summarises industry‑wide sales patterns.

For precise internal analysis, the direct POS data are preferred. For benchmarking against competitors, the indirect market report adds valuable context.

9. Key Points to Remember

  • Direct data give you maximum control but can be costly and time‑consuming.
  • Indirect data are useful for background, context and comparison, but you must check relevance, age and original quality.
  • Assess the five quality criteria: accuracy, relevance, age, level of detail and completeness.
  • Validate data (presence, range, type, etc.) to ensure suitability; verify data to confirm they have been recorded correctly.
  • Encrypt sensitive data – choose the appropriate method (symmetric, asymmetric, TLS/SSL, IPsec) and manage keys securely.
  • Match the processing method (batch, online, real‑time) to the business need and resource constraints.
  • Combine direct and indirect sources where it enriches the analysis.

10. Suggested Diagram

Decision flowchart: Choose Direct or Indirect source → Assess quality criteria → Validation → Verification → (optional) Encryption → Select processing method (Batch / Online / Real‑time).

11. Mathematical Representation of Data Processing

The transformation of raw data D into useful information I can be expressed as:

$$ I = f(D) $$

where f represents the set of processing operations (sorting, filtering, aggregation, validation, encryption, etc.).

Create an account or Login to take a Quiz

46 views
0 improvement suggestions

Log in to suggest improvements to this note.