Cambridge Notes, Past Papers, Revision Questions

1.2 Multimedia – Sound Representation and Encoding

1. Where Sound Fits in the Cambridge Computer Science Programme

The Cambridge AS & A‑Level Computer Science syllabus is divided into a series of thematic blocks (Data Representation, Communication, Hardware, System Software, Security & Ethics, Algorithms & Data Structures, Programming, Software Development, and the optional A‑Level extensions). The topic “Sound Representation and Encoding” belongs to the Multimedia sub‑topic of the Data Representation block (Section 1.2). Mastery of this material supports:

AO1 – factual knowledge of how analogue signals are converted to digital form.

AO2 – analysis of the impact of sampling rate, bit depth and compression on quality and file size.

AO3 – design, implementation and evaluation of a simple audio‑processing solution (e.g. a lab activity or a small program that reads/writes PCM data).

Later A‑Level extensions (e.g. data‑compression algorithms, floating‑point representation, networking protocols) build directly on the concepts introduced here.

2. What Is Sound?

Sound is a mechanical longitudinal wave that propagates through a material medium (air, water, solids) as alternating compressions and rarefactions of pressure. In the physical world it is a continuous analogue signal; computers can only store and manipulate discrete digital values, so the analogue waveform must be sampled and quantised.

3. Digital Audio Fundamentals

3.1 Sampling

Sampling – measuring the instantaneous amplitude of the analogue waveform at regular time intervals.

Sample rate (frequency) – number of samples taken per second, expressed in hertz (Hz). Common rates:
- 44.1 kHz – CD quality (covers the audible range up to ≈22 kHz)
- 48 kHz – standard for video and broadcast
- 96 kHz / 192 kHz – high‑resolution audio

Nyquist–Shannon theorem – a signal can be perfectly reconstructed if the sampling frequency f_s satisfies
\(f{s} > 2\,f{\max}\)
where f_max is the highest frequency component present.

Nyquist frequency – the highest frequency that can be represented without aliasing:
\(f{N}=f{s}/2\)

Aliasing – if frequencies above f_N are present, they are reflected back into the audible band, producing distortion. To prevent this, an anti‑aliasing low‑pass filter (usually with a cutoff just below f_N) is placed before the ADC.

3.2 Quantisation & Bit Depth

Quantisation – mapping each sampled amplitude to the nearest value from a finite set of levels.

Bit depth (n) – number of bits used for each sample; determines the number of quantisation levels \(2^{n}\).

Dynamic range – ratio of the loudest undistorted signal to the quietest detectable signal. The theoretical dynamic range of PCM is
\(\text{DR}=6.02\,n+1.76\ \text{dB}\)

Quantisation error (noise) – the difference between the true analogue value and the rounded digital value. Its RMS amplitude is approximately one‑half of the quantisation step size.

3.3 Numeric Example – Quantisation Error & SNR

Assume a full‑scale sinusoid ranging from –1 V to +1 V.

Bit depth	Levels \(2^{n}\)	Step size (V)	Peak‑to‑Peak SNR (dB)
8 bits	256	\(2/256 = 0.0078\)	≈ 49.9 dB
16 bits	65 536	\(2/65 536 = 3.05\times10^{-5}\)	≈ 96.3 dB
24 bits	16 777 216	\(2/16 777 216 = 1.19\times10^{-7}\)	≈ 144.5 dB

The table shows how each additional 8 bits roughly doubles the number of quantisation steps and improves the signal‑to‑noise ratio by about 48 dB.

3.4 Byte Ordering (Endianness)

When PCM samples are stored in a file, the order of the constituent bytes matters:

Little‑endian – least‑significant byte first (used by WAV on Windows).

Big‑endian – most‑significant byte first (used by AIFF on macOS).

Understanding endianness is essential for AO3 tasks that involve writing code to read or write raw audio data.

4. PCM (Pulse‑Code Modulation) Encoding Process

Analogue signal enters an analogue‑to‑digital converter (ADC).

The ADC applies an anti‑aliasing low‑pass filter.

Sampling at the chosen sample rate produces a sequence of amplitude values.

Each value is quantised according to the selected bit depth.

Quantised values are stored as binary numbers (e.g., 16‑bit signed integers, respecting endianness).

Quantisation error is introduced at this stage.

The resulting PCM data may be:
- saved directly (uncompressed), or
- passed to a compression algorithm (lossless or lossy) before storage/transmission.

5. Audio File Formats & Compression

Format	Compression type	Typical sample rates	Typical bit depths	Typical compression ratio*	Typical use‑case
WAV	Uncompressed (PCM)	8 kHz – 192 kHz	8, 16, 24, 32‑bit	1 : 1	Professional recording & editing (Windows)
AIFF	Uncompressed (PCM)	8 kHz – 192 kHz	8, 16, 24, 32‑bit	1 : 1	Apple ecosystem, studio work
MP3	Lossy (psychoacoustic masking)	16 kHz – 48 kHz	Variable (bit‑rate 64–320 kbps)	≈ 10 : 1 – 12 : 1 (at 128 kbps)	Streaming, portable devices
AAC	Lossy (advanced psychoacoustic model)	16 kHz – 48 kHz	Variable (bit‑rate 64–256 kbps)	≈ 12 : 1 – 15 : 1 (at 128 kbps)	Modern streaming services (e.g., YouTube, Spotify)
FLAC	Lossless	8 kHz – 192 kHz	16, 24‑bit	≈ 2 : 1 – 3 : 1	High‑fidelity archiving, audiophile distribution

*Compression ratio = original uncompressed size ÷ compressed size.

5.1 Lossless vs. Lossy Compression

Lossless (FLAC, ALAC) – every bit of the original PCM data can be perfectly reconstructed; useful when the highest fidelity is required.

Lossy (MP3, AAC) – exploits psychoacoustic masking to discard audio components that are inaudible to most listeners. Higher compression ratios are achieved at the cost of possible artefacts (e.g., “swirling”, loss of high‑frequency detail).

6. Practical Calculations

6.1 File‑size of Uncompressed Audio

Given:

Sample rate \(f_{s}=44{,}100\) Hz

Bit depth \(n=16\) bits

Channels = 2 (stereo)

Duration \(t=3\) min = 180 s

Data rate:

\(\text{Data rate}=f_{s}\times n \times \text{channels}=44{,}100\times16\times2=1{,}411{,}200\ \text{bits/s}\)

File size:

\(\text{Size}= \dfrac{\text{Data rate}\times t}{8}= \dfrac{1{,}411{,}200\times180}{8}\approx31{,}752{,}000\ \text{bytes}\approx30.3\ \text{MiB}\)

6.2 Signal‑to‑Noise Ratio (SNR) from Bit Depth

Using the theoretical formula \(\text{SNR}=6.02\,n+1.76\) dB:

8‑bit → ≈ 49.9 dB

16‑bit → ≈ 96.3 dB

24‑bit → ≈ 144.5 dB

These values give a quick way to decide whether a higher bit depth is needed for a particular application.

7. Practical Considerations for A‑Level Projects

Match the sample rate to the target playback medium (44.1 kHz for CD‑quality audio, 48 kHz for video, 96 kHz only when a very high‑frequency response is required).

Use 16‑bit depth for most music and speech projects; reserve 24‑bit for professional‑grade recordings or when a > 100 dB dynamic range is essential.

If bandwidth or storage is limited, choose a lossy codec with an appropriate bitrate:
- ≈ 64 kbps MP3 – clear speech, podcasts.
- ≈ 128 kbps MP3 – acceptable music quality for casual listening.
- ≈ 256 kbps AAC – high‑quality streaming.

Always perform a subjective listening test after compression; note any loss of high‑frequency content, ringing, or “warbling”.

Document every choice (sample rate, bit depth, codec, bitrate, endianness) – this satisfies AO2 (analysis) and AO3 (practical implementation).

8. Suggested Laboratory Activity (AO3)

Install a free, cross‑platform audio editor such as Audacity (available for Windows, macOS, Linux).

Record a 10‑second 440 Hz sine wave (A4) using the default microphone at 44.1 kHz / 16‑bit PCM and export as .wav.

Using Audacity’s “Export → MP3” function, create three compressed versions:
- 64 kbps (low quality)
- 128 kbps (medium quality)
- 256 kbps (high quality)

Record the file size of each version and calculate the compression ratio.

Listen to each MP3 on the same set of headphones. Note any audible differences (e.g., loss of high‑frequency harmonics, pre‑echo, or “metallic” artefacts).

Write a brief report (≈ 300 words) that:
- Explains why the Nyquist theorem dictated the 44.1 kHz sample rate.
- Relates the observed quality loss to quantisation error and psychoacoustic masking.
- Evaluates which bitrate offers the best trade‑off between size and quality for speech vs. music.

9. Mapping to Cambridge Assessment Objectives

Syllabus Sub‑topic	Relevant AO(s)
Definition of sound as a mechanical wave	AO1
Sampling, sample rate & Nyquist theorem (including aliasing & anti‑aliasing filter)	AO1, AO2
Quantisation, bit depth, dynamic range, SNR, quantisation error (numeric example)	AO1, AO2
Endianness & byte ordering of PCM data	AO2, AO3 (code implementation)
PCM encoding pipeline	AO1, AO2, AO3 (design a simple encoder/decoder)
File formats, typical parameters and compression ratios	AO1, AO2
Lossless vs. lossy compression & psychoacoustic masking	AO1, AO2
File‑size and SNR calculations	AO2 (apply mathematics)
Practical checklist for project planning	AO3
Laboratory activity (record → compress → evaluate)	AO3 (design, implement, evaluate)

10. Suggested Diagram

Flowchart of the audio‑encoding pipeline – Analogue signal → Anti‑aliasing filter → ADC (sampling + quantisation) → PCM data (byte‑ordered) → Optional compression (lossless or lossy) → Final audio file (WAV/AIFF/MP3/AAC/FLAC…).

Show understanding of how sound is represented and encoded

1.2 Multimedia – Sound Representation and Encoding

1. Where Sound Fits in the Cambridge Computer Science Programme

2. What Is Sound?

3. Digital Audio Fundamentals

3.1 Sampling

3.2 Quantisation & Bit Depth

3.3 Numeric Example – Quantisation Error & SNR

3.4 Byte Ordering (Endianness)

4. PCM (Pulse‑Code Modulation) Encoding Process

5. Audio File Formats & Compression

5.1 Lossless vs. Lossy Compression

6. Practical Calculations

6.1 File‑size of Uncompressed Audio

6.2 Signal‑to‑Noise Ratio (SNR) from Bit Depth

7. Practical Considerations for A‑Level Projects

8. Suggested Laboratory Activity (AO3)

9. Mapping to Cambridge Assessment Objectives

10. Suggested Diagram