Describe batch processing applications (utility bills, payroll)

Data, Information and Processing Methods

1.1 Data & Information

Data are raw, unprocessed facts or measurements. Information is data that have been organised, processed or interpreted so that they become meaningful to a user.

Example: A smart‑meter records the numbers 578, 582, 590. These are data. When the figures are summed and presented as “Your electricity consumption this month is 1 750 kWh”, the result is information.

Sources of data

Source typeDirect examplesIndirect examples
Surveys / questionnaires Customer‑satisfaction forms filled in‑person National census data published by the statistics office
Sensors / instruments Temperature‑sensor readings from a factory line Web‑scraped price lists from competitor websites
Transactions Point‑of‑sale receipts captured at checkout Historical financial statements downloaded from a public register

Direct vs. indirect data (Cambridge requirement)

  • Direct data are collected first‑hand for the specific purpose of the system (e.g., meter readings uploaded by the utility).
    Advantages: usually more accurate, timely and under the control of the organisation.
    Disadvantages: may be costly to obtain and can raise privacy concerns.
  • Indirect data are obtained from secondary sources that were created for another purpose (e.g., census data, web‑scraped information).
    Advantages: inexpensive, readily available, often large‑scale.
    Disadvantages: may be outdated, less relevant, and can contain bias.

Ethical / legal considerations

  • Consent – individuals must agree to the collection of personal data.
  • Data protection legislation – GDPR, Data Protection Act, etc.
  • Confidentiality – sensitive data must be stored and transmitted securely.

1.2 Quality of Information

Information must meet five quality criteria. The table shows each criterion, how it can be assessed and a brief illustration.

CriterionHow to assessIllustration
Accuracy Cross‑check with a trusted source; use validation rules. A mis‑read meter gives a bill £20 too high.
Relevance Ask whether the information answers the business question. Using last year’s weather data to predict today’s flood risk is irrelevant.
Age (Timeliness) Check timestamps or version numbers; define acceptable age limits. Out‑of‑date stock levels cause a shop to sell items it no longer has.
Completeness Verify that all mandatory fields are present; use “presence” validation. Missing overtime entries lead to under‑payment of staff.
Consistency Ensure uniform formats, units and coding across the whole data set. One record uses “USD”, another “$”; this can cause calculation errors.

Data provenance – recording where each data item originated, who collected it and when – helps to judge all five quality criteria.

1.3 Encryption – need, methods and protocols

Encryption protects data from unauthorised access by converting it into a coded form. It is essential when data are stored (at rest) or transmitted (in transit) over insecure networks.

Methods

  • Symmetric encryption – one secret key is used for both encryption and decryption (e.g., AES). Fast, but the key must be shared securely.
  • Asymmetric encryption – a public key encrypts, a private key decrypts (e.g., RSA). Solves the key‑distribution problem but is slower.

Key‑management practices (syllabus point)

  • Key generation using a trusted random‑number source.
  • Secure storage (hardware security modules, encrypted key vaults).
  • Regular key rotation and revocation procedures.
  • Backup of keys in an offline, tamper‑proof location.

Common protocols

ProtocolPurposeTypical use
TLS / SSLSecure web traffic (HTTPS)Online banking, e‑commerce
IPsecSecure IP‑level communicationVirtual private networks (VPNs)

Advantages & disadvantages

AdvantageDisadvantage
Confidentiality – data cannot be read without the key Performance overhead – encryption/decryption consumes CPU cycles
Integrity – tampering can be detected (e.g., with MACs) Key‑management complexity, especially for symmetric keys
Figure 1 – Simplified TLS handshake (placeholder for diagram)

1.4 Validation & Verification – methods and purpose

Both techniques ensure that data are correct before they are processed, but they address different stages.

Validation (checking input data)

CheckDescription
PresenceNo mandatory field left blank.
RangeValues fall within acceptable limits (e.g., 0 ≤ hours ≤ 200).
TypeNumeric, text, date, etc.
FormatCorrect number of digits, separators, or pattern (e.g., “A‑123”).
ConsistencyRelated fields agree (e.g., start ≤ end date).

Verification (checking output data)

CheckDescription
Double‑entryTwo independent entries give the same result.
Checksum / hashTotals match expected values (e.g., sum of gross pay).
Re‑calculationSample records are processed manually and compared.

Mini‑exercise (payroll CSV validation)

  1. Open the supplied payroll.csv file.
  2. Validate each row for:
    • Employee ID – numeric and present.
    • Hours worked – numeric, 0 ≤ value.
    • Pay‑rate – positive number.
    • Tax code – matches pattern “A‑123”.
  3. Verify by manually calculating the total gross pay for the first five rows and comparing with the program’s output.

Sample test‑plan template (useful for the practical task)

Test ID | Description                     | Input data                | Expected result
-------|---------------------------------|---------------------------|------------------------------
V01    | Check mandatory fields present  | Row 12 of payroll.csv     | No blank fields
V02    | Verify numeric range for hours   | Hours = ‑5               | Validation error
V03    | Confirm tax‑code format          | TaxCode = “B‑987”        | Pass
V04    | Verify checksum of gross totals  | Whole file                | Sum = 1 254 320.00

1.5 Processing Methods – batch, online and real‑time

Definitions

  • Batch processing – jobs are collected and run together without user interaction during execution.
  • Online (interactive) processing – data are entered and processed immediately while the user is present.
  • Real‑time processing – data are processed within a strict time limit (often milliseconds) as they arrive.

Advantages & disadvantages (summary)

MethodTypical response timeKey advantageKey limitation
BatchMinutes‑to‑hours (after run starts)High throughput, low per‑transaction costNo immediate feedback to user
OnlineSecondsInstant confirmation to userHigher resource demand per transaction
Real‑timeMilliseconds or lessDeterministic, time‑critical responseComplex, costly hardware/software

Algorithm for selecting a processing method

The Cambridge syllabus requires an algorithm (or flowchart) that decides which method to use based on business requirements.

Algorithm ChooseProcessingMethod
Input: dataVolume, requiredResponse, resourceBudget
Output: processingMethod

1. IF requiredResponse ≤ 0.1 seconds THEN
       processingMethod ← “Real‑time”
   ELSE IF requiredResponse ≤ 5 seconds THEN
       processingMethod ← “Online”
   ELSE
       // response time is not critical
       IF dataVolume > 10 000 records AND resourceBudget is limited THEN
            processingMethod ← “Batch”
       ELSE
            processingMethod ← “Online”   // small volume, interactive preferred
       END IF
   END IF
2. RETURN processingMethod

Link to system life‑cycle stages (syllabus point)

  • Analysis & design – decide which processing method best meets the functional and non‑functional requirements.
  • Implementation – develop the chosen method (e.g., write batch scripts, build web services, or program interrupt‑driven real‑time modules).
  • Testing – validate that the method meets response‑time and throughput criteria.
  • Deployment & maintenance – monitor performance; switch methods if business needs change.

Batch‑Processing Applications

What is batch processing?

A batch is a collection of similar jobs that are processed together as a single unit, without manual intervention during execution. The typical batch cycle consists of four stages:

  1. Data collection – gather all required inputs.
  2. Processing – apply rules, calculations or transformations.
  3. Output generation – produce reports, files, statements, etc.
  4. Distribution / archiving – send results to users or other systems and store for future reference.

Typical batch‑processing applications

  • Utility billing (electricity, water, gas)
  • Payroll for organisations
  • Bank statement generation
  • Stock‑taking and inventory updates
  • End‑of‑day financial reports

Utility Bills

Utility companies must produce millions of bills each month. Batch processing guarantees consistency, speed and cost‑effectiveness.

  1. Data collection – read consumption data from smart meters, manual uploads or legacy databases.
  2. Rate application – apply tariff tables, discounts, subsidies and statutory taxes.
  3. Bill generation – create itemised statements (PDF, XML, or printed form) including consumption, charges and total due.
  4. Distribution – email, portal upload, or postal dispatch; update the customer‑account database with payment status.
  5. Archiving & audit – store each bill for the legally required retention period and generate audit logs.

Why batch? The data set is huge, the calculation rules are fixed, and the output is needed at a set time each month, making a single, automated run the most efficient solution.

Payroll

Payroll systems calculate earnings, deductions and taxes for every employee in an organisation.

  1. Gather input data – attendance logs, overtime sheets, leave records, bonuses, commission entries.
  2. Calculate earnings – apply salary scales, overtime rates, shift differentials, commission formulas.
  3. Apply deductions – tax, national insurance, pension contributions, union fees, garnishments.
  4. Produce outputs – payslips (digital or printed), journal entries for the finance ledger, and a batch file for bank transfers (e.g., BACS, ACH).
  5. Archive & audit – store payroll records for statutory retention periods and generate compliance reports.

Why batch? Payroll must be processed for all staff at the same cut‑off date, guaranteeing uniformity, legal compliance and a clear audit trail.

Comparison of Batch‑Processing Applications

Application Processing frequency Typical data volume Common software Key processing steps
Utility Bills Monthly Hundreds of thousands to millions of records Billing suites (e.g., SAP IS‑U, Oracle Utilities, IBM InfoSphere) Data collection → Rate application → Bill generation → Distribution → Archiving
Payroll Weekly, bi‑weekly or monthly Dozens to tens of thousands of employee records Payroll packages (e.g., ADP, Sage Payroll, SAP HR, QuickBooks Payroll) Attendance capture → Earnings calculation → Deductions → Payslip & bank file → Archive
Figure 2 – Flowchart of the batch‑processing cycle (Data collection → Processing → Output generation → Distribution/archiving). Use this as a template for exam diagrams.

Create an account or Login to take a Quiz

35 views
0 improvement suggestions

Log in to suggest improvements to this note.