Cambridge Syllabus Notes

12 IT in Society – Overview

This section follows the Cambridge 9626 syllabus (Unit 12). It gives concise, exam‑focused notes on all five sub‑topics, with the data‑mining material expanded and linked to the wider impact of information technology.

12.1 Digital currencies

Definition: electronic money that uses cryptographic techniques to secure transactions and control the creation of new units.
Key types
- Cryptocurrencies – e.g., Bitcoin, Ethereum (decentralised, based on blockchain).
- Central bank digital currencies (CBDCs) – issued and regulated by a national bank (e.g., China’s e‑CNY).
- Stable‑coins – linked to a fiat currency or commodity to reduce price volatility.
How they work (basic blockchain idea)
1. Transactions are grouped into blocks.
2. Each block is cryptographically linked to the previous one, forming an immutable chain.
3. Network participants (miners or validators) reach consensus on the order of blocks.
Societal impacts
- Greater financial inclusion for unbanked populations.
- Potential for faster, cheaper cross‑border payments.
- Regulatory challenges – anti‑money‑laundering (AML), taxation, consumer protection.
- Security concerns – wallet theft, ransomware payments, and the environmental impact of proof‑of‑work systems.

12.2 Data mining

What is data mining?

Data mining is the systematic process of extracting useful patterns, trends or predictive models from large, often complex, data sets. It turns raw data into knowledge that can support decision‑making.

Mathematically, a data‑mining task seeks a model M that maximises a utility function U(M, D), where D is the data set. In practice the “utility’’ is measured by performance metrics such as accuracy, precision or profit.

The data‑mining process (six stages)

Business (problem) understanding – define the objective (e.g., detect fraudulent transactions).
Data understanding – collect, explore and describe the raw data sources.
Data preparation – clean, transform, handle missing values and select relevant attributes.
Modelling – apply appropriate algorithms (classification, clustering, association rules, etc.).
Evaluation – assess the model with suitable metrics and verify that it meets the original objective.
Deployment – integrate the model into a real‑world system (e.g., an IDS or a recommendation engine).

Typical data sources & pre‑processing

Sources: transaction logs, network traffic, crime records, electronic health records (EHRs), genomic sequences, sensor streams, web click‑streams, social‑media feeds.
Pre‑processing steps
- Cleaning – remove duplicates, correct errors.
- Missing‑value handling – imputation or deletion.
- Normalization / scaling – bring attributes to comparable ranges.
- Feature engineering – create or select attributes that improve model performance.

Key techniques, algorithms and exam‑style examples

Technique	Typical algorithm(s)	Exam‑style example (application)
Classification	Decision tree, Naïve Bayes, Support Vector Machine, Neural network	Decision‑tree classification → credit‑card fraud detection
Clustering	k‑means, hierarchical clustering, DBSCAN	k‑means clustering → customer segmentation for targeted marketing
Association‑rule mining	Apriori, FP‑Growth	Apriori → market‑basket rule {bread, butter} → {jam}
Anomaly detection	One‑class SVM, Isolation Forest, statistical outlier tests	Isolation Forest → intrusion detection in network traffic
Regression / predictive modelling	Linear regression, Random Forest regression, Gradient Boosting	Gradient Boosting → forecasting hospital bed occupancy

Data‑mining applications

1. Security

Fraud detection – classification on credit‑card transaction data to flag suspicious purchases.
Intrusion detection systems (IDS) – anomaly detection on network‑packet logs; isolation‑forest models raise alerts for abnormal traffic patterns.
Predictive policing – clustering of historic crime records to identify future hotspots and optimise patrol routes.

2. Healthcare

Disease‑outbreak prediction – time‑series regression on EHRs and social‑media feeds to forecast flu epidemics.
Personalised treatment – classification of genomic data (e.g., SVM) to recommend targeted therapies.
Hospital resource optimisation – clustering of admission patterns to predict bed occupancy and staff scheduling needs.

3. Marketing

Customer segmentation – k‑means clustering of purchase histories to create distinct market groups.
Recommendation engines – association‑rule mining (e.g., {bread, butter} → {jam}) to suggest complementary products.
Churn prediction – classification (logistic regression) to identify customers likely to discontinue a service.

Evaluating models

Classification metrics: accuracy, precision, recall, F‑score, ROC‑AUC. In security a high recall (few missed frauds) is crucial; in healthcare precision (few false diagnoses) is often prioritised.
Clustering quality: silhouette score, Davies‑Bouldin index – useful for assessing the usefulness of customer or crime clusters.
Cross‑validation – repeated train‑test splits to ensure the model generalises to unseen data.

Comparison of the three domains

Domain	Typical data sources	Common techniques	Key benefits	Major challenges / societal impact
Security	Transaction logs, network packets, crime records	Anomaly detection, classification, clustering	Early threat identification, reduced financial loss, safer public spaces	Privacy intrusion, false‑positive alerts, potential for surveillance abuse
Healthcare	EHRs, genomic sequences, wearable‑sensor streams	Classification, regression, time‑series analysis	Improved diagnosis, personalised therapy, efficient resource use	Highly sensitive data, regulatory compliance (HIPAA/GDPR), risk of misdiagnosis or algorithmic bias
Marketing	Sales transactions, web click‑streams, social‑media data	Clustering, association rules, predictive analytics	Higher conversion rates, tailored offers, better customer retention	Data quality issues, consumer privacy, manipulation or discriminatory targeting

Ethical, legal and societal considerations

Informed consent – individuals must know how their data will be used.
Data minimisation – retain only the data necessary for the specific mining task.
Transparency & explainability – especially in high‑impact areas (healthcare, security) the model’s decisions should be understandable to users and regulators.
Compliance with legislation – GDPR (EU), HIPAA (USA) and other sector‑specific regulations set strict rules on data handling, storage and sharing.
Bias and discrimination – mining results can reinforce existing societal biases; models must be audited for fairness.
Social impact – weigh benefits (e.g., crime reduction) against potential harms (e.g., increased surveillance, loss of autonomy).

12.3 Social networking services/platforms

Major platforms: Facebook, Instagram, Twitter/X, TikTok, LinkedIn, Snapchat.
Primary uses
- Personal communication and sharing of media.
- Business marketing and brand engagement.
- Civic participation – organising events, campaigning, crowdsourcing information.
Advantages
- Instant global connectivity.
- Opportunities for e‑commerce and targeted advertising.
- Rapid dissemination of news and emergency alerts.
Disadvantages / risks
- Privacy erosion – personal data harvested for profiling.
- Misinformation and echo‑chamber effects.
- Cyber‑bullying, online harassment, and mental‑health concerns.
- Potential for algorithmic bias in content recommendation.

12.4 The impact of IT on society

Information technology reshapes almost every sector. The table below summarises a few illustrative examples, linking back to data‑mining where relevant.

Sector	IT development	Positive impact	Negative / ethical concerns
Sport	Wearable sensors & performance analytics	Optimised training, injury prevention	Data ownership, commercial exploitation of athletes’ biometrics
Manufacturing	Internet of Things (IoT) and predictive maintenance	Reduced downtime, energy savings	Workforce displacement, security of connected devices
Education	Virtual classrooms, learning‑analytics dashboards	Personalised learning pathways, wider access	Digital divide, privacy of student data
Public services	Smart‑city sensors, e‑government portals	Improved service efficiency, citizen engagement	Surveillance concerns, data‑security breaches

12.5 Technology‑enhanced learning (TEL)

Key tools
- Massive Open Online Courses (MOOCs) – Coursera, edX, FutureLearn.
- Virtual labs and simulations – PhET, Labster.
- Learning‑analytics platforms – dashboards that mine student interaction data to predict performance.
How data mining supports TEL
- Identifying at‑risk learners through classification of click‑stream and assessment data.
- Personalising content recommendations using clustering of learning styles.
- Providing feedback loops for teachers via predictive models of course completion.
Benefits – flexible access, adaptive pacing, evidence‑based teaching strategies.
Challenges – ensuring data privacy, avoiding algorithmic bias, maintaining student motivation without face‑to‑face interaction.

Summary

Across digital currencies, data mining, social networking, broader IT impacts and technology‑enhanced learning, the Cambridge syllabus expects you to describe the technology, give concrete examples, and evaluate benefits against ethical, legal and societal challenges. Master the six‑stage data‑mining process, know the key techniques and their typical applications, and be ready to discuss privacy, bias and regulatory issues – all of which are common points in exam questions.

Describe data mining applications (security, healthcare, security)

12 IT in Society – Overview

12.1 Digital currencies

12.2 Data mining