1.2 Multimedia – Sound Representation and Encoding
1. Where Sound Fits in the Cambridge Computer Science Programme
The Cambridge AS & A‑Level Computer Science syllabus is divided into a series of thematic blocks (Data Representation, Communication, Hardware, System Software, Security & Ethics, Algorithms & Data Structures, Programming, Software Development, and the optional A‑Level extensions). The topic “Sound Representation and Encoding” belongs to the Multimedia sub‑topic of the Data Representation block (Section 1.2). Mastery of this material supports:
- AO1 – factual knowledge of how analogue signals are converted to digital form.
- AO2 – analysis of the impact of sampling rate, bit depth and compression on quality and file size.
- AO3 – design, implementation and evaluation of a simple audio‑processing solution (e.g. a lab activity or a small program that reads/writes PCM data).
Later A‑Level extensions (e.g. data‑compression algorithms, floating‑point representation, networking protocols) build directly on the concepts introduced here.
2. What Is Sound?
Sound is a mechanical longitudinal wave that propagates through a material medium (air, water, solids) as alternating compressions and rarefactions of pressure. In the physical world it is a continuous analogue signal; computers can only store and manipulate discrete digital values, so the analogue waveform must be sampled and quantised.
3. Digital Audio Fundamentals
3.1 Sampling
- Sampling – measuring the instantaneous amplitude of the analogue waveform at regular time intervals.
- Sample rate (frequency) – number of samples taken per second, expressed in hertz (Hz). Common rates:
- 44.1 kHz – CD quality (covers the audible range up to ≈22 kHz)
- 48 kHz – standard for video and broadcast
- 96 kHz / 192 kHz – high‑resolution audio
- Nyquist–Shannon theorem – a signal can be perfectly reconstructed if the sampling frequency fs satisfies
\(f_{s} > 2\,f_{\max}\)
where fmax is the highest frequency component present.
- Nyquist frequency – the highest frequency that can be represented without aliasing:
\(f_{N}=f_{s}/2\)
- Aliasing – if frequencies above fN are present, they are reflected back into the audible band, producing distortion. To prevent this, an anti‑aliasing low‑pass filter (usually with a cutoff just below fN) is placed before the ADC.
3.2 Quantisation & Bit Depth
3.3 Numeric Example – Quantisation Error & SNR
Assume a full‑scale sinusoid ranging from –1 V to +1 V.
| Bit depth | Levels \(2^{n}\) | Step size (V) | Peak‑to‑Peak SNR (dB) |
| 8 bits | 256 | \(2/256 = 0.0078\) | ≈ 49.9 dB |
| 16 bits | 65 536 | \(2/65 536 = 3.05\times10^{-5}\) | ≈ 96.3 dB |
| 24 bits | 16 777 216 | \(2/16 777 216 = 1.19\times10^{-7}\) | ≈ 144.5 dB |
The table shows how each additional 8 bits roughly doubles the number of quantisation steps and improves the signal‑to‑noise ratio by about 48 dB.
3.4 Byte Ordering (Endianness)
When PCM samples are stored in a file, the order of the constituent bytes matters:
- Little‑endian – least‑significant byte first (used by WAV on Windows).
- Big‑endian – most‑significant byte first (used by AIFF on macOS).
Understanding endianness is essential for AO3 tasks that involve writing code to read or write raw audio data.
4. PCM (Pulse‑Code Modulation) Encoding Process
- Analogue signal enters an analogue‑to‑digital converter (ADC).
- The ADC applies an anti‑aliasing low‑pass filter.
- Sampling at the chosen sample rate produces a sequence of amplitude values.
- Each value is quantised according to the selected bit depth.
- Quantised values are stored as binary numbers (e.g., 16‑bit signed integers, respecting endianness).
- Quantisation error is introduced at this stage.
- The resulting PCM data may be:
- saved directly (uncompressed), or
- passed to a compression algorithm (lossless or lossy) before storage/transmission.
5. Audio File Formats & Compression
| Format |
Compression type |
Typical sample rates |
Typical bit depths |
Typical compression ratio* |
Typical use‑case |
| WAV |
Uncompressed (PCM) |
8 kHz – 192 kHz |
8, 16, 24, 32‑bit |
1 : 1 |
Professional recording & editing (Windows) |
| AIFF |
Uncompressed (PCM) |
8 kHz – 192 kHz |
8, 16, 24, 32‑bit |
1 : 1 |
Apple ecosystem, studio work |
| MP3 |
Lossy (psychoacoustic masking) |
16 kHz – 48 kHz |
Variable (bit‑rate 64–320 kbps) |
≈ 10 : 1 – 12 : 1 (at 128 kbps) |
Streaming, portable devices |
| AAC |
Lossy (advanced psychoacoustic model) |
16 kHz – 48 kHz |
Variable (bit‑rate 64–256 kbps) |
≈ 12 : 1 – 15 : 1 (at 128 kbps) |
Modern streaming services (e.g., YouTube, Spotify) |
| FLAC |
Lossless |
8 kHz – 192 kHz |
16, 24‑bit |
≈ 2 : 1 – 3 : 1 |
High‑fidelity archiving, audiophile distribution |
*Compression ratio = original uncompressed size ÷ compressed size.
5.1 Lossless vs. Lossy Compression
- Lossless (FLAC, ALAC) – every bit of the original PCM data can be perfectly reconstructed; useful when the highest fidelity is required.
- Lossy (MP3, AAC) – exploits psychoacoustic masking to discard audio components that are inaudible to most listeners. Higher compression ratios are achieved at the cost of possible artefacts (e.g., “swirling”, loss of high‑frequency detail).
6. Practical Calculations
6.1 File‑size of Uncompressed Audio
Given:
- Sample rate \(f_{s}=44{,}100\) Hz
- Bit depth \(n=16\) bits
- Channels = 2 (stereo)
- Duration \(t=3\) min = 180 s
Data rate:
\(\text{Data rate}=f_{s}\times n \times \text{channels}=44{,}100\times16\times2=1{,}411{,}200\ \text{bits/s}\)
File size:
\(\text{Size}= \dfrac{\text{Data rate}\times t}{8}= \dfrac{1{,}411{,}200\times180}{8}\approx31{,}752{,}000\ \text{bytes}\approx30.3\ \text{MiB}\)
6.2 Signal‑to‑Noise Ratio (SNR) from Bit Depth
Using the theoretical formula \(\text{SNR}=6.02\,n+1.76\) dB:
- 8‑bit → ≈ 49.9 dB
- 16‑bit → ≈ 96.3 dB
- 24‑bit → ≈ 144.5 dB
These values give a quick way to decide whether a higher bit depth is needed for a particular application.
7. Practical Considerations for A‑Level Projects
- Match the sample rate to the target playback medium (44.1 kHz for CD‑quality audio, 48 kHz for video, 96 kHz only when a very high‑frequency response is required).
- Use 16‑bit depth for most music and speech projects; reserve 24‑bit for professional‑grade recordings or when a > 100 dB dynamic range is essential.
- If bandwidth or storage is limited, choose a lossy codec with an appropriate bitrate:
- ≈ 64 kbps MP3 – clear speech, podcasts.
- ≈ 128 kbps MP3 – acceptable music quality for casual listening.
- ≈ 256 kbps AAC – high‑quality streaming.
- Always perform a subjective listening test after compression; note any loss of high‑frequency content, ringing, or “warbling”.
- Document every choice (sample rate, bit depth, codec, bitrate, endianness) – this satisfies AO2 (analysis) and AO3 (practical implementation).
8. Suggested Laboratory Activity (AO3)
- Install a free, cross‑platform audio editor such as Audacity (available for Windows, macOS, Linux).
- Record a 10‑second 440 Hz sine wave (A4) using the default microphone at 44.1 kHz / 16‑bit PCM and export as
.wav.
- Using Audacity’s “Export → MP3” function, create three compressed versions:
- 64 kbps (low quality)
- 128 kbps (medium quality)
- 256 kbps (high quality)
- Record the file size of each version and calculate the compression ratio.
- Listen to each MP3 on the same set of headphones. Note any audible differences (e.g., loss of high‑frequency harmonics, pre‑echo, or “metallic” artefacts).
- Write a brief report (≈ 300 words) that:
- Explains why the Nyquist theorem dictated the 44.1 kHz sample rate.
- Relates the observed quality loss to quantisation error and psychoacoustic masking.
- Evaluates which bitrate offers the best trade‑off between size and quality for speech vs. music.
9. Mapping to Cambridge Assessment Objectives
| Syllabus Sub‑topic | Relevant AO(s) |
| Definition of sound as a mechanical wave | AO1 |
| Sampling, sample rate & Nyquist theorem (including aliasing & anti‑aliasing filter) | AO1, AO2 |
| Quantisation, bit depth, dynamic range, SNR, quantisation error (numeric example) | AO1, AO2 |
| Endianness & byte ordering of PCM data | AO2, AO3 (code implementation) |
| PCM encoding pipeline | AO1, AO2, AO3 (design a simple encoder/decoder) |
| File formats, typical parameters and compression ratios | AO1, AO2 |
| Lossless vs. lossy compression & psychoacoustic masking | AO1, AO2 |
| File‑size and SNR calculations | AO2 (apply mathematics) |
| Practical checklist for project planning | AO3 |
| Laboratory activity (record → compress → evaluate) | AO3 (design, implement, evaluate) |
10. Suggested Diagram
Flowchart of the audio‑encoding pipeline – Analogue signal → Anti‑aliasing filter → ADC (sampling + quantisation) → PCM data (byte‑ordered) → Optional compression (lossless or lossy) → Final audio file (WAV/AIFF/MP3/AAC/FLAC…).