Objective: Understanding how artificial neural networks have helped with machine learning
Artificial Neural Networks (ANNs) are computational models inspired by the structure of biological neurons. They have become a cornerstone of modern machine learning because they can automatically discover complex patterns from data.
Key Concepts
Neuron (perceptron): computes a weighted sum of its inputs and passes the result through an activation function.
Activation function: introduces non‑linearity; common choices are the sigmoid \$\sigma(z)=\frac{1}{1+e^{-z}}\$, ReLU \$f(z)=\max(0,z)\$, and tanh.
Network architecture: layers of neurons (input, hidden, output) connected by weighted edges.
Training (learning): adjusting weights to minimise a loss function, typically using gradient descent and back‑propagation.
Mathematical Formulation
The output \$a\$ of a single neuron is given by
\$\$
a = \phi\!\left(\sum{i=1}^{n} wi x_i + b\right)
\$\$
where \$xi\$ are the inputs, \$wi\$ the corresponding weights, \$b\$ the bias, and \$\phi\$ the activation function.
For a network with parameters \$\theta\$, the loss \$L(\theta)\$ (e.g., mean‑squared error) is minimised by updating the weights in the opposite direction of the gradient:
Object detection in photographs – convolutional neural networks (CNNs) locate and label multiple objects.
Speech‑to‑text – recurrent neural networks (RNNs) and transformers model temporal dependencies.
Strategic game playing – deep reinforcement learning combines ANNs with reward‑based learning.
Comparison of ANN Types
Network Type
Typical Use‑case
Key Feature
Multilayer Perceptron (MLP)
Tabular data, simple pattern classification
Fully connected layers, static input size
Convolutional Neural Network (CNN)
Image and video analysis
Convolutional filters exploit spatial locality
Recurrent Neural Network (RNN)
Sequential data such as speech or text
Hidden state carries information across time steps
Transformer
Large‑scale language models, translation
Self‑attention mechanism enables parallel processing of sequences
Limitations and Considerations
Require large labelled datasets for effective training.
Training can be computationally intensive; GPUs or specialised hardware are often needed.
Risk of over‑fitting if the network is too large relative to the data.
Interpretability: internal representations are often opaque (“black‑box”).
Suggested diagram: A schematic of a three‑layer neural network showing input neurons, hidden neurons, output neurons, and weighted connections.
Summary
Artificial neural networks have transformed machine learning by providing flexible, powerful models capable of learning complex, non‑linear mappings directly from data. Their ability to automatically extract features, scale with large datasets, and be adapted across domains makes them indispensable tools in modern AI.