Show understanding of how artificial neural networks have helped with machine learning

Published by Patrick Mutisya · 14 days ago

AI – Artificial Neural Networks

18.1 Artificial Intelligence (AI)

Objective: Understanding how artificial neural networks have helped with machine learning

Artificial Neural Networks (ANNs) are computational models inspired by the structure of biological neurons. They have become a cornerstone of modern machine learning because they can automatically discover complex patterns from data.

Key Concepts

  • Neuron (perceptron): computes a weighted sum of its inputs and passes the result through an activation function.
  • Activation function: introduces non‑linearity; common choices are the sigmoid \$\sigma(z)=\frac{1}{1+e^{-z}}\$, ReLU \$f(z)=\max(0,z)\$, and tanh.
  • Network architecture: layers of neurons (input, hidden, output) connected by weighted edges.
  • Training (learning): adjusting weights to minimise a loss function, typically using gradient descent and back‑propagation.

Mathematical Formulation

The output \$a\$ of a single neuron is given by

\$\$

a = \phi\!\left(\sum{i=1}^{n} wi x_i + b\right)

\$\$

where \$xi\$ are the inputs, \$wi\$ the corresponding weights, \$b\$ the bias, and \$\phi\$ the activation function.

For a network with parameters \$\theta\$, the loss \$L(\theta)\$ (e.g., mean‑squared error) is minimised by updating the weights in the opposite direction of the gradient:

\$\$

\theta \leftarrow \theta - \eta \,\nabla_{\theta} L(\theta)

\$\$

with learning rate \$\eta\$.

How ANNs Have Advanced Machine Learning

  1. Learning non‑linear relationships – Unlike linear models, ANNs can approximate any continuous function (Universal Approximation Theorem).
  2. Automatic feature extraction – Deep networks learn hierarchical representations, reducing the need for manual feature engineering.
  3. Scalability with data – Performance improves as more labelled data becomes available, a property exploited in big‑data contexts.
  4. Transfer learning – Pre‑trained networks can be fine‑tuned for new tasks, speeding up development.
  5. Real‑world applications – Successes in image classification, speech recognition, natural language processing, and game playing (e.g., AlphaGo).

Illustrative Applications

  • Handwritten digit recognition (MNIST) – a multilayer perceptron achieves >98% accuracy.
  • Object detection in photographs – convolutional neural networks (CNNs) locate and label multiple objects.
  • Speech‑to‑text – recurrent neural networks (RNNs) and transformers model temporal dependencies.
  • Strategic game playing – deep reinforcement learning combines ANNs with reward‑based learning.

Comparison of ANN Types

Network TypeTypical Use‑caseKey Feature
Multilayer Perceptron (MLP)Tabular data, simple pattern classificationFully connected layers, static input size
Convolutional Neural Network (CNN)Image and video analysisConvolutional filters exploit spatial locality
Recurrent Neural Network (RNN)Sequential data such as speech or textHidden state carries information across time steps
TransformerLarge‑scale language models, translationSelf‑attention mechanism enables parallel processing of sequences

Limitations and Considerations

  • Require large labelled datasets for effective training.
  • Training can be computationally intensive; GPUs or specialised hardware are often needed.
  • Risk of over‑fitting if the network is too large relative to the data.
  • Interpretability: internal representations are often opaque (“black‑box”).

Suggested diagram: A schematic of a three‑layer neural network showing input neurons, hidden neurons, output neurons, and weighted connections.

Summary

Artificial neural networks have transformed machine learning by providing flexible, powerful models capable of learning complex, non‑linear mappings directly from data. Their ability to automatically extract features, scale with large datasets, and be adapted across domains makes them indispensable tools in modern AI.