12 IT in Society – Overview
This section follows the Cambridge 9626 syllabus (Unit 12). It gives concise, exam‑focused notes on all five sub‑topics, with the data‑mining material expanded and linked to the wider impact of information technology.
12.1 Digital currencies
- Definition: electronic money that uses cryptographic techniques to secure transactions and control the creation of new units.
- Key types
- Cryptocurrencies – e.g., Bitcoin, Ethereum (decentralised, based on blockchain).
- Central bank digital currencies (CBDCs) – issued and regulated by a national bank (e.g., China’s e‑CNY).
- Stable‑coins – linked to a fiat currency or commodity to reduce price volatility.
- How they work (basic blockchain idea)
- Transactions are grouped into blocks.
- Each block is cryptographically linked to the previous one, forming an immutable chain.
- Network participants (miners or validators) reach consensus on the order of blocks.
- Societal impacts
- Greater financial inclusion for unbanked populations.
- Potential for faster, cheaper cross‑border payments.
- Regulatory challenges – anti‑money‑laundering (AML), taxation, consumer protection.
- Security concerns – wallet theft, ransomware payments, and the environmental impact of proof‑of‑work systems.
12.2 Data mining
What is data mining?
Data mining is the systematic process of extracting useful patterns, trends or predictive models from large, often complex, data sets. It turns raw data into knowledge that can support decision‑making.
Mathematically, a data‑mining task seeks a model M that maximises a utility function U(M, D), where D is the data set. In practice the “utility’’ is measured by performance metrics such as accuracy, precision or profit.
The data‑mining process (six stages)
- Business (problem) understanding – define the objective (e.g., detect fraudulent transactions).
- Data understanding – collect, explore and describe the raw data sources.
- Data preparation – clean, transform, handle missing values and select relevant attributes.
- Modelling – apply appropriate algorithms (classification, clustering, association rules, etc.).
- Evaluation – assess the model with suitable metrics and verify that it meets the original objective.
- Deployment – integrate the model into a real‑world system (e.g., an IDS or a recommendation engine).
Typical data sources & pre‑processing
- Sources: transaction logs, network traffic, crime records, electronic health records (EHRs), genomic sequences, sensor streams, web click‑streams, social‑media feeds.
- Pre‑processing steps
- Cleaning – remove duplicates, correct errors.
- Missing‑value handling – imputation or deletion.
- Normalization / scaling – bring attributes to comparable ranges.
- Feature engineering – create or select attributes that improve model performance.
Key techniques, algorithms and exam‑style examples
| Technique |
Typical algorithm(s) |
Exam‑style example (application) |
| Classification |
Decision tree, Naïve Bayes, Support Vector Machine, Neural network |
Decision‑tree classification → credit‑card fraud detection |
| Clustering |
k‑means, hierarchical clustering, DBSCAN |
k‑means clustering → customer segmentation for targeted marketing |
| Association‑rule mining |
Apriori, FP‑Growth |
Apriori → market‑basket rule {bread, butter} → {jam} |
| Anomaly detection |
One‑class SVM, Isolation Forest, statistical outlier tests |
Isolation Forest → intrusion detection in network traffic |
| Regression / predictive modelling |
Linear regression, Random Forest regression, Gradient Boosting |
Gradient Boosting → forecasting hospital bed occupancy |
Data‑mining applications
1. Security
- Fraud detection – classification on credit‑card transaction data to flag suspicious purchases.
- Intrusion detection systems (IDS) – anomaly detection on network‑packet logs; isolation‑forest models raise alerts for abnormal traffic patterns.
- Predictive policing – clustering of historic crime records to identify future hotspots and optimise patrol routes.
2. Healthcare
- Disease‑outbreak prediction – time‑series regression on EHRs and social‑media feeds to forecast flu epidemics.
- Personalised treatment – classification of genomic data (e.g., SVM) to recommend targeted therapies.
- Hospital resource optimisation – clustering of admission patterns to predict bed occupancy and staff scheduling needs.
3. Marketing
- Customer segmentation – k‑means clustering of purchase histories to create distinct market groups.
- Recommendation engines – association‑rule mining (e.g., {bread, butter} → {jam}) to suggest complementary products.
- Churn prediction – classification (logistic regression) to identify customers likely to discontinue a service.
Evaluating models
- Classification metrics: accuracy, precision, recall, F‑score, ROC‑AUC. In security a high recall (few missed frauds) is crucial; in healthcare precision (few false diagnoses) is often prioritised.
- Clustering quality: silhouette score, Davies‑Bouldin index – useful for assessing the usefulness of customer or crime clusters.
- Cross‑validation – repeated train‑test splits to ensure the model generalises to unseen data.
Comparison of the three domains
| Domain |
Typical data sources |
Common techniques |
Key benefits |
Major challenges / societal impact |
| Security |
Transaction logs, network packets, crime records |
Anomaly detection, classification, clustering |
Early threat identification, reduced financial loss, safer public spaces |
Privacy intrusion, false‑positive alerts, potential for surveillance abuse |
| Healthcare |
EHRs, genomic sequences, wearable‑sensor streams |
Classification, regression, time‑series analysis |
Improved diagnosis, personalised therapy, efficient resource use |
Highly sensitive data, regulatory compliance (HIPAA/GDPR), risk of misdiagnosis or algorithmic bias |
| Marketing |
Sales transactions, web click‑streams, social‑media data |
Clustering, association rules, predictive analytics |
Higher conversion rates, tailored offers, better customer retention |
Data quality issues, consumer privacy, manipulation or discriminatory targeting |
Ethical, legal and societal considerations
- Informed consent – individuals must know how their data will be used.
- Data minimisation – retain only the data necessary for the specific mining task.
- Transparency & explainability – especially in high‑impact areas (healthcare, security) the model’s decisions should be understandable to users and regulators.
- Compliance with legislation – GDPR (EU), HIPAA (USA) and other sector‑specific regulations set strict rules on data handling, storage and sharing.
- Bias and discrimination – mining results can reinforce existing societal biases; models must be audited for fairness.
- Social impact – weigh benefits (e.g., crime reduction) against potential harms (e.g., increased surveillance, loss of autonomy).
12.3 Social networking services/platforms
- Major platforms: Facebook, Instagram, Twitter/X, TikTok, LinkedIn, Snapchat.
- Primary uses
- Personal communication and sharing of media.
- Business marketing and brand engagement.
- Civic participation – organising events, campaigning, crowdsourcing information.
- Advantages
- Instant global connectivity.
- Opportunities for e‑commerce and targeted advertising.
- Rapid dissemination of news and emergency alerts.
- Disadvantages / risks
- Privacy erosion – personal data harvested for profiling.
- Misinformation and echo‑chamber effects.
- Cyber‑bullying, online harassment, and mental‑health concerns.
- Potential for algorithmic bias in content recommendation.
12.4 The impact of IT on society
Information technology reshapes almost every sector. The table below summarises a few illustrative examples, linking back to data‑mining where relevant.
| Sector |
IT development |
Positive impact |
Negative / ethical concerns |
| Sport |
Wearable sensors & performance analytics |
Optimised training, injury prevention |
Data ownership, commercial exploitation of athletes’ biometrics |
| Manufacturing |
Internet of Things (IoT) and predictive maintenance |
Reduced downtime, energy savings |
Workforce displacement, security of connected devices |
| Education |
Virtual classrooms, learning‑analytics dashboards |
Personalised learning pathways, wider access |
Digital divide, privacy of student data |
| Public services |
Smart‑city sensors, e‑government portals |
Improved service efficiency, citizen engagement |
Surveillance concerns, data‑security breaches |
12.5 Technology‑enhanced learning (TEL)
- Key tools
- Massive Open Online Courses (MOOCs) – Coursera, edX, FutureLearn.
- Virtual labs and simulations – PhET, Labster.
- Learning‑analytics platforms – dashboards that mine student interaction data to predict performance.
- How data mining supports TEL
- Identifying at‑risk learners through classification of click‑stream and assessment data.
- Personalising content recommendations using clustering of learning styles.
- Providing feedback loops for teachers via predictive models of course completion.
- Benefits – flexible access, adaptive pacing, evidence‑based teaching strategies.
- Challenges – ensuring data privacy, avoiding algorithmic bias, maintaining student motivation without face‑to‑face interaction.
Summary
Across digital currencies, data mining, social networking, broader IT impacts and technology‑enhanced learning, the Cambridge syllabus expects you to describe the technology, give concrete examples, and evaluate benefits against ethical, legal and societal challenges. Master the six‑stage data‑mining process, know the key techniques and their typical applications, and be ready to discuss privacy, bias and regulatory issues – all of which are common points in exam questions.