Show understanding of how sound is represented and encoded

1.2 Multimedia – Sound Representation and Encoding

1. Where Sound Fits in the Cambridge Computer Science Programme

The Cambridge AS & A‑Level Computer Science syllabus is divided into a series of thematic blocks (Data Representation, Communication, Hardware, System Software, Security & Ethics, Algorithms & Data Structures, Programming, Software Development, and the optional A‑Level extensions). The topic “Sound Representation and Encoding” belongs to the Multimedia sub‑topic of the Data Representation block (Section 1.2). Mastery of this material supports:

  • AO1 – factual knowledge of how analogue signals are converted to digital form.
  • AO2 – analysis of the impact of sampling rate, bit depth and compression on quality and file size.
  • AO3 – design, implementation and evaluation of a simple audio‑processing solution (e.g. a lab activity or a small program that reads/writes PCM data).

Later A‑Level extensions (e.g. data‑compression algorithms, floating‑point representation, networking protocols) build directly on the concepts introduced here.

2. What Is Sound?

Sound is a mechanical longitudinal wave that propagates through a material medium (air, water, solids) as alternating compressions and rarefactions of pressure. In the physical world it is a continuous analogue signal; computers can only store and manipulate discrete digital values, so the analogue waveform must be sampled and quantised.

3. Digital Audio Fundamentals

3.1 Sampling

  • Sampling – measuring the instantaneous amplitude of the analogue waveform at regular time intervals.
  • Sample rate (frequency) – number of samples taken per second, expressed in hertz (Hz). Common rates:
    • 44.1 kHz – CD quality (covers the audible range up to ≈22 kHz)
    • 48 kHz – standard for video and broadcast
    • 96 kHz / 192 kHz – high‑resolution audio
  • Nyquist–Shannon theorem – a signal can be perfectly reconstructed if the sampling frequency fs satisfies

    \(f_{s} > 2\,f_{\max}\)

    where fmax is the highest frequency component present.
  • Nyquist frequency – the highest frequency that can be represented without aliasing:

    \(f_{N}=f_{s}/2\)

  • Aliasing – if frequencies above fN are present, they are reflected back into the audible band, producing distortion. To prevent this, an anti‑aliasing low‑pass filter (usually with a cutoff just below fN) is placed before the ADC.

3.2 Quantisation & Bit Depth

  • Quantisation – mapping each sampled amplitude to the nearest value from a finite set of levels.
  • Bit depth (n) – number of bits used for each sample; determines the number of quantisation levels \(2^{n}\).
  • Dynamic range – ratio of the loudest undistorted signal to the quietest detectable signal. The theoretical dynamic range of PCM is

    \(\text{DR}=6.02\,n+1.76\ \text{dB}\)

  • Quantisation error (noise) – the difference between the true analogue value and the rounded digital value. Its RMS amplitude is approximately one‑half of the quantisation step size.

3.3 Numeric Example – Quantisation Error & SNR

Assume a full‑scale sinusoid ranging from –1 V to +1 V.

Bit depthLevels \(2^{n}\)Step size (V)Peak‑to‑Peak SNR (dB)
8 bits256\(2/256 = 0.0078\)≈ 49.9 dB
16 bits65 536\(2/65 536 = 3.05\times10^{-5}\)≈ 96.3 dB
24 bits16 777 216\(2/16 777 216 = 1.19\times10^{-7}\)≈ 144.5 dB

The table shows how each additional 8 bits roughly doubles the number of quantisation steps and improves the signal‑to‑noise ratio by about 48 dB.

3.4 Byte Ordering (Endianness)

When PCM samples are stored in a file, the order of the constituent bytes matters:

  • Little‑endian – least‑significant byte first (used by WAV on Windows).
  • Big‑endian – most‑significant byte first (used by AIFF on macOS).

Understanding endianness is essential for AO3 tasks that involve writing code to read or write raw audio data.

4. PCM (Pulse‑Code Modulation) Encoding Process

  1. Analogue signal enters an analogue‑to‑digital converter (ADC).
  2. The ADC applies an anti‑aliasing low‑pass filter.
  3. Sampling at the chosen sample rate produces a sequence of amplitude values.
  4. Each value is quantised according to the selected bit depth.
  5. Quantised values are stored as binary numbers (e.g., 16‑bit signed integers, respecting endianness).
  6. Quantisation error is introduced at this stage.
  7. The resulting PCM data may be:
    • saved directly (uncompressed), or
    • passed to a compression algorithm (lossless or lossy) before storage/transmission.

5. Audio File Formats & Compression

Format Compression type Typical sample rates Typical bit depths Typical compression ratio* Typical use‑case
WAV Uncompressed (PCM) 8 kHz – 192 kHz 8, 16, 24, 32‑bit 1 : 1 Professional recording & editing (Windows)
AIFF Uncompressed (PCM) 8 kHz – 192 kHz 8, 16, 24, 32‑bit 1 : 1 Apple ecosystem, studio work
MP3 Lossy (psychoacoustic masking) 16 kHz – 48 kHz Variable (bit‑rate 64–320 kbps) ≈ 10 : 1 – 12 : 1 (at 128 kbps) Streaming, portable devices
AAC Lossy (advanced psychoacoustic model) 16 kHz – 48 kHz Variable (bit‑rate 64–256 kbps) ≈ 12 : 1 – 15 : 1 (at 128 kbps) Modern streaming services (e.g., YouTube, Spotify)
FLAC Lossless 8 kHz – 192 kHz 16, 24‑bit ≈ 2 : 1 – 3 : 1 High‑fidelity archiving, audiophile distribution

*Compression ratio = original uncompressed size ÷ compressed size.

5.1 Lossless vs. Lossy Compression

  • Lossless (FLAC, ALAC) – every bit of the original PCM data can be perfectly reconstructed; useful when the highest fidelity is required.
  • Lossy (MP3, AAC) – exploits psychoacoustic masking to discard audio components that are inaudible to most listeners. Higher compression ratios are achieved at the cost of possible artefacts (e.g., “swirling”, loss of high‑frequency detail).

6. Practical Calculations

6.1 File‑size of Uncompressed Audio

Given:

  • Sample rate \(f_{s}=44{,}100\) Hz
  • Bit depth \(n=16\) bits
  • Channels = 2 (stereo)
  • Duration \(t=3\) min = 180 s

Data rate:

\(\text{Data rate}=f_{s}\times n \times \text{channels}=44{,}100\times16\times2=1{,}411{,}200\ \text{bits/s}\)

File size:

\(\text{Size}= \dfrac{\text{Data rate}\times t}{8}= \dfrac{1{,}411{,}200\times180}{8}\approx31{,}752{,}000\ \text{bytes}\approx30.3\ \text{MiB}\)

6.2 Signal‑to‑Noise Ratio (SNR) from Bit Depth

Using the theoretical formula \(\text{SNR}=6.02\,n+1.76\) dB:

  • 8‑bit → ≈ 49.9 dB
  • 16‑bit → ≈ 96.3 dB
  • 24‑bit → ≈ 144.5 dB

These values give a quick way to decide whether a higher bit depth is needed for a particular application.

7. Practical Considerations for A‑Level Projects

  • Match the sample rate to the target playback medium (44.1 kHz for CD‑quality audio, 48 kHz for video, 96 kHz only when a very high‑frequency response is required).
  • Use 16‑bit depth for most music and speech projects; reserve 24‑bit for professional‑grade recordings or when a > 100 dB dynamic range is essential.
  • If bandwidth or storage is limited, choose a lossy codec with an appropriate bitrate:
    • ≈ 64 kbps MP3 – clear speech, podcasts.
    • ≈ 128 kbps MP3 – acceptable music quality for casual listening.
    • ≈ 256 kbps AAC – high‑quality streaming.
  • Always perform a subjective listening test after compression; note any loss of high‑frequency content, ringing, or “warbling”.
  • Document every choice (sample rate, bit depth, codec, bitrate, endianness) – this satisfies AO2 (analysis) and AO3 (practical implementation).

8. Suggested Laboratory Activity (AO3)

  1. Install a free, cross‑platform audio editor such as Audacity (available for Windows, macOS, Linux).
  2. Record a 10‑second 440 Hz sine wave (A4) using the default microphone at 44.1 kHz / 16‑bit PCM and export as .wav.
  3. Using Audacity’s “Export → MP3” function, create three compressed versions:
    • 64 kbps (low quality)
    • 128 kbps (medium quality)
    • 256 kbps (high quality)
  4. Record the file size of each version and calculate the compression ratio.
  5. Listen to each MP3 on the same set of headphones. Note any audible differences (e.g., loss of high‑frequency harmonics, pre‑echo, or “metallic” artefacts).
  6. Write a brief report (≈ 300 words) that:
    • Explains why the Nyquist theorem dictated the 44.1 kHz sample rate.
    • Relates the observed quality loss to quantisation error and psychoacoustic masking.
    • Evaluates which bitrate offers the best trade‑off between size and quality for speech vs. music.

9. Mapping to Cambridge Assessment Objectives

Syllabus Sub‑topicRelevant AO(s)
Definition of sound as a mechanical waveAO1
Sampling, sample rate & Nyquist theorem (including aliasing & anti‑aliasing filter)AO1, AO2
Quantisation, bit depth, dynamic range, SNR, quantisation error (numeric example)AO1, AO2
Endianness & byte ordering of PCM dataAO2, AO3 (code implementation)
PCM encoding pipelineAO1, AO2, AO3 (design a simple encoder/decoder)
File formats, typical parameters and compression ratiosAO1, AO2
Lossless vs. lossy compression & psychoacoustic maskingAO1, AO2
File‑size and SNR calculationsAO2 (apply mathematics)
Practical checklist for project planningAO3
Laboratory activity (record → compress → evaluate)AO3 (design, implement, evaluate)

10. Suggested Diagram

Flowchart of the audio‑encoding pipeline – Analogue signal → Anti‑aliasing filter → ADC (sampling + quantisation) → PCM data (byte‑ordered) → Optional compression (lossless or lossy) → Final audio file (WAV/AIFF/MP3/AAC/FLAC…).

Create an account or Login to take a Quiz

96 views
0 improvement suggestions

Log in to suggest improvements to this note.