Show understanding of the need for and examples of the use of compression

1.3 Compression

Objective

Show understanding of why compression is required, give examples of lossless and lossy techniques for all major media types (text, bitmap, vector, sound, video and binary data), and justify the most appropriate method in a given situation.

Why is Compression Needed?

  • Storage limitations – data must fit on disks, SSDs, USB sticks or other removable media.
  • Transmission efficiency – smaller files use less bandwidth and reach the receiver faster.
  • Cost reduction – less storage space and lower network‑transfer charges.
  • Performance – reduced I/O time, quicker loading of programmes and media.
  • Energy consumption – less data to move means lower power use, especially on mobile devices.

Exam‑style prompt

A school wants to store 500 MB of instructional video on a 1 GB USB stick. Explain why compression is required and state which type of compression (lossless or lossy) would be most appropriate, giving one reason for your choice.

Types of Compression

  • Lossless compression – the original data can be reconstructed exactly; no information is permanently lost.
  • Lossy compression – some information is permanently discarded to obtain higher compression ratios; the reconstructed data is an approximation of the original.

Lossless vs. Lossy – Quick Comparison

AspectLosslessLossy
Data integrityExact original data can be recoveredOriginal data is approximated; some information is lost
Typical compression ratio2 : 1 – 3 : 1 (up to ≈5 : 1 for highly redundant data)10 : 1 – 100 : 1 (depends on content and quality settings)
Common applicationsSource code, text documents, executables, archival storage, medical images, video in lossless modePhotographs, music, streaming video, web graphics, most consumer video files
Algorithm examplesRun‑Length Encoding (RLE), Huffman, LZW, DEFLATE, Arithmetic coding, PNG/FLAC/ZIPJPEG, MP3, AAC, H.264 / H.265 / VP9 / AV1, WebP (lossy mode)

Common Lossless Techniques

  1. Run‑Length Encoding (RLE) – stores a value together with the number of consecutive repetitions.
  2. Huffman Coding – builds a binary tree based on symbol frequencies; more frequent symbols receive shorter codes.
  3. Lempel‑Ziv‑Welch (LZW) – constructs a dictionary of repeated substrings and replaces them with short indices.
  4. Arithmetic Coding – represents an entire message as a single number in the interval [0,1); it can approach the theoretical entropy limit more closely than Huffman.

Huffman Coding – Step‑by‑Step Example

String: ABRACADABRA

SymbolFrequency
A5
B2
R2
C1
D1

Building the Huffman tree (merge two lowest‑frequency nodes each step):

  1. Merge C (1) and D (1) → node CD (2)
  2. Merge B (2) and R (2) → node BR (4)
  3. Merge CD (2) and A (5) → node A‑CD (7)
  4. Merge BR (4) and A‑CD (7) → root (11)

Resulting codes (left‑branch = 0, right‑branch = 1):

  • A → 0
  • B → 110
  • R → 111
  • C → 1010
  • D → 1011

Common Lossy Techniques

  1. Transform coding – e.g., JPEG for images uses the Discrete Cosine Transform (DCT) to separate frequency components before quantisation.
  2. Perceptual coding – e.g., MP3 for audio removes frequencies that are inaudible to the human ear (psycho‑acoustic model).
  3. Predictive (motion‑compensated) coding – e.g., MPEG‑2, H.264/AVC, H.265/HEVC, VP9 and AV1 store differences between successive video frames (motion vectors) rather than full frames.

Everyday Examples of Compression by Media Type

Media typeLossless formats / toolsLossy formats / tools
Text filesZIP, GZIP, BZIP2, 7z (DEFLATE), LZMA— (text is almost never compressed lossily)
Bitmap imagesPNG, BMP (RLE), TIFF (LZW or ZIP), lossless WebPJPEG, WebP (lossy), HEIF (HEIC)
Vector graphicsSVG + gzip (SVGZ), PDF‑optimised streams, EPS with LZW compressionSimplification / point‑reduction algorithms (e.g., SVG path‑simplify) – a form of lossy reduction
Sound filesFLAC, ALAC, WAV (compressed with DEFLATE), Apple LosslessMP3, AAC, OGG Vorbis, Opus
Video filesFFV1, H.264 lossless mode, Apple ProRes (visually lossless), AV1 losslessH.264/AVC, H.265/HEVC, VP9, AV1, MPEG‑2
Binary / executable filesZIP, GZIP, 7z, LZMA, UPX (executable packer – lossless)— (binary data is never compressed lossily)
Web pages (HTTP transfer)Brotli, GZIP, Deflate (applied to HTML, CSS, JavaScript)— (compression is lossless for web assets)

Justification Exercise (AO2)

Scenario: You need to publish a technical diagram (vector graphic) on the school website. The original SVG file is 2 MB. Which compression method should you use and why?

  1. Choose a lossless approach – e.g., gzip the SVG (producing an .svgz file).
  2. Reasoning:

    • The diagram must remain perfectly scalable and retain exact geometry; any loss of points would change dimensions or line thickness.
    • Lossless compression reduces file size (typically 30‑50 % for SVG) without altering the visual appearance, satisfying accessibility and future editing requirements.
    • Web browsers natively support on‑the‑fly decompression of .svgz, so no extra client‑side processing is needed.

Impact of Compression on System Design (AO3)

  • CPU overhead – Encoding (especially predictive video codecs or arithmetic coding) can be CPU‑intensive; real‑time applications must balance quality against processing time.
  • Memory usage – Buffers are required for both compressed and decompressed data; streaming video often uses double‑buffering to avoid stutter.
  • Latency – Heavy compression adds delay; services such as video‑on‑demand use fast, moderate‑ratio codecs (e.g., H.264 baseline) to keep start‑up time low.
  • Energy consumption – More CPU work means higher battery drain on mobile devices; choosing a lower‑complexity codec can extend battery life.
  • Design decision – When selecting a compression library, consider licensing, platform support, and whether the algorithm meets the required fidelity (e.g., lossless for medical images, lossy for promotional videos).

Suggested diagram: Flowchart – Input data → Encoder (compression algorithm) → Compressed file → Transmission / Storage → Decoder (decompression) → Reconstructed data.

Summary

Compression is essential for efficient storage and transmission. Understanding the trade‑off between lossless (exact reconstruction, modest ratios) and lossy (higher ratios, some quality loss) enables you to select the most suitable technique for each media type—text, bitmap, vector, sound, video or binary data. Familiarity with key algorithms (RLE, Huffman, LZW, arithmetic coding, JPEG, MP3, H.264/HEVC) and awareness of their impact on CPU, memory, latency and energy help you justify choices in exam questions and in real‑world system design.