1.2 Multimedia – Sound Representation and Encoding
What is Sound?
Sound is a mechanical wave that propagates through a medium (air, water, solids) as variations in pressure. In digital systems we must convert this continuous analogue signal into a discrete digital form that a computer can store, process and transmit.
Key Concepts in Digital Audio
Sampling – measuring the amplitude of the analogue waveform at regular time intervals.
Quantisation – assigning each sampled amplitude a numeric value from a finite set of levels.
Bit depth – the number of bits used for each quantised sample.
Sample rate (frequency) – the number of samples taken per second, measured in Hertz (Hz).
Dynamic range – the ratio between the loudest undistorted signal and the quietest detectable signal.
Sampling Theory
The Nyquist–Shannon sampling theorem states that a continuous signal can be perfectly reconstructed from its samples if the sampling frequency \$fs\$ is greater than twice the highest frequency component \$f{max}\$ of the signal.
\$fN = \frac{fs}{2}\$
where \$f_N\$ is the Nyquist frequency. For human hearing (≈20 kHz), a common minimum sample rate is 44.1 kHz (used in CD audio).
Quantisation and Bit Depth
Quantisation maps each sampled amplitude to one of \$2^n\$ discrete levels, where \$n\$ is the bit depth.
The theoretical dynamic range of a PCM (Pulse‑Code Modulation) system is:
\$DR = 6.02 \times n + 1.76\ \text{dB}\$
Examples:
8‑bit audio → \$DR \approx 50\$ dB
16‑bit audio → \$DR \approx 96\$ dB (CD quality)
24‑bit audio → \$DR \approx 144\$ dB (professional recording)
PCM Encoding Process
Analogue signal enters an analogue‑to‑digital converter (ADC).
ADC samples the signal at the chosen sample rate.
Each sample is quantised to the nearest level defined by the bit depth.
The quantised values are stored as binary numbers (e.g., 16‑bit signed integers).
Optionally, the PCM data may be compressed (lossless or lossy) before storage or transmission.
Common Audio File Formats
Format
Compression Type
Typical Sample Rates
Typical Bit Depths
Typical Use
WAV
Uncompressed (PCM)
8 kHz – 192 kHz
8, 16, 24, 32‑bit
Professional audio, editing
AIFF
Uncompressed (PCM)
8 kHz – 192 kHz
8, 16, 24, 32‑bit
Apple ecosystem, studio work
MP3
Lossy (psychoacoustic)
16 kHz – 48 kHz
Variable (bit‑rate 64–320 kbps)
Streaming, portable devices
AAC
Lossy (advanced psychoacoustic)
16 kHz – 48 kHz
Variable (bit‑rate 64–256 kbps)
Modern streaming services
FLAC
Lossless
8 kHz – 192 kHz
16, 24‑bit
High‑fidelity archiving
Lossless vs. Lossy Compression
Lossless algorithms (e.g., FLAC, ALAC) reduce file size without discarding any audio information. The original PCM data can be perfectly reconstructed.
Lossy algorithms (e.g., MP3, AAC) remove audio components that are less audible to the human ear, achieving much higher compression ratios at the cost of some quality loss.
Example Calculation: File Size of Uncompressed Audio
Choose a sample rate that matches the intended playback device (e.g., 44.1 kHz for CD‑quality audio).
Use 16‑bit depth for most applications; higher bit depths are only needed for professional recording.
When bandwidth is limited, prefer lossy formats with an appropriate bit‑rate (e.g., 128 kbps MP3 for speech).
Always test the audible quality after compression; subjective listening tests are essential.
Suggested diagram: Flowchart of the audio encoding process – from analogue signal, through sampling, quantisation, PCM storage, optional compression, to final file format.