Show understanding of the need for and examples of the use of compression

1.3 Compression

Objective

Show understanding of why compression is required, give examples of lossless and lossy techniques for all major media types (text, bitmap, vector, sound, video and binary data), and justify the most appropriate method in a given situation.

Why is Compression Needed?

  • Storage limitations – data must fit on disks, SSDs, USB sticks or other removable media.
  • Transmission efficiency – smaller files use less bandwidth and reach the receiver faster.
  • Cost reduction – less storage space and lower network‑transfer charges.
  • Performance – reduced I/O time, quicker loading of programmes and media.
  • Energy consumption – less data to move means lower power use, especially on mobile devices.

Exam‑style prompt

A school wants to store 500 MB of instructional video on a 1 GB USB stick. Explain why compression is required and state which type of compression (lossless or lossy) would be most appropriate, giving one reason for your choice.

Types of Compression

  • Lossless compression – the original data can be reconstructed exactly; no information is permanently lost.
  • Lossy compression – some information is permanently discarded to obtain higher compression ratios; the reconstructed data is an approximation of the original.

Lossless vs. Lossy – Quick Comparison

Aspect Lossless Lossy
Data integrity Exact original data can be recovered Original data is approximated; some information is lost
Typical compression ratio 2 : 1 – 3 : 1 (up to ≈5 : 1 for highly redundant data) 10 : 1 – 100 : 1 (depends on content and quality settings)
Common applications Source code, text documents, executables, archival storage, medical images, video in lossless mode Photographs, music, streaming video, web graphics, most consumer video files
Algorithm examples Run‑Length Encoding (RLE), Huffman, LZW, DEFLATE, Arithmetic coding, PNG/FLAC/ZIP JPEG, MP3, AAC, H.264 / H.265 / VP9 / AV1, WebP (lossy mode)

Common Lossless Techniques

  1. Run‑Length Encoding (RLE) – stores a value together with the number of consecutive repetitions.
  2. Huffman Coding – builds a binary tree based on symbol frequencies; more frequent symbols receive shorter codes.
  3. Lempel‑Ziv‑Welch (LZW) – constructs a dictionary of repeated substrings and replaces them with short indices.
  4. Arithmetic Coding – represents an entire message as a single number in the interval [0,1); it can approach the theoretical entropy limit more closely than Huffman.

Huffman Coding – Step‑by‑Step Example

String: ABRACADABRA

SymbolFrequency
A5
B2
R2
C1
D1

Building the Huffman tree (merge two lowest‑frequency nodes each step):

  1. Merge C (1) and D (1) → node CD (2)
  2. Merge B (2) and R (2) → node BR (4)
  3. Merge CD (2) and A (5) → node A‑CD (7)
  4. Merge BR (4) and A‑CD (7) → root (11)

Resulting codes (left‑branch = 0, right‑branch = 1):

  • A → 0
  • B → 110
  • R → 111
  • C → 1010
  • D → 1011

Common Lossy Techniques

  1. Transform coding – e.g., JPEG for images uses the Discrete Cosine Transform (DCT) to separate frequency components before quantisation.
  2. Perceptual coding – e.g., MP3 for audio removes frequencies that are inaudible to the human ear (psycho‑acoustic model).
  3. Predictive (motion‑compensated) coding – e.g., MPEG‑2, H.264/AVC, H.265/HEVC, VP9 and AV1 store differences between successive video frames (motion vectors) rather than full frames.

Everyday Examples of Compression by Media Type

Media type Lossless formats / tools Lossy formats / tools
Text files ZIP, GZIP, BZIP2, 7z (DEFLATE), LZMA — (text is almost never compressed lossily)
Bitmap images PNG, BMP (RLE), TIFF (LZW or ZIP), lossless WebP JPEG, WebP (lossy), HEIF (HEIC)
Vector graphics SVG + gzip (SVGZ), PDF‑optimised streams, EPS with LZW compression Simplification / point‑reduction algorithms (e.g., SVG path‑simplify) – a form of lossy reduction
Sound files FLAC, ALAC, WAV (compressed with DEFLATE), Apple Lossless MP3, AAC, OGG Vorbis, Opus
Video files FFV1, H.264 lossless mode, Apple ProRes (visually lossless), AV1 lossless H.264/AVC, H.265/HEVC, VP9, AV1, MPEG‑2
Binary / executable files ZIP, GZIP, 7z, LZMA, UPX (executable packer – lossless) — (binary data is never compressed lossily)
Web pages (HTTP transfer) Brotli, GZIP, Deflate (applied to HTML, CSS, JavaScript) — (compression is lossless for web assets)

Justification Exercise (AO2)

Scenario: You need to publish a technical diagram (vector graphic) on the school website. The original SVG file is 2 MB. Which compression method should you use and why?

  1. Choose a lossless approach – e.g., gzip the SVG (producing an .svgz file).
  2. Reasoning:
    • The diagram must remain perfectly scalable and retain exact geometry; any loss of points would change dimensions or line thickness.
    • Lossless compression reduces file size (typically 30‑50 % for SVG) without altering the visual appearance, satisfying accessibility and future editing requirements.
    • Web browsers natively support on‑the‑fly decompression of .svgz, so no extra client‑side processing is needed.

Impact of Compression on System Design (AO3)

  • CPU overhead – Encoding (especially predictive video codecs or arithmetic coding) can be CPU‑intensive; real‑time applications must balance quality against processing time.
  • Memory usage – Buffers are required for both compressed and decompressed data; streaming video often uses double‑buffering to avoid stutter.
  • Latency – Heavy compression adds delay; services such as video‑on‑demand use fast, moderate‑ratio codecs (e.g., H.264 baseline) to keep start‑up time low.
  • Energy consumption – More CPU work means higher battery drain on mobile devices; choosing a lower‑complexity codec can extend battery life.
  • Design decision – When selecting a compression library, consider licensing, platform support, and whether the algorithm meets the required fidelity (e.g., lossless for medical images, lossy for promotional videos).
Suggested diagram: Flowchart – Input data → Encoder (compression algorithm) → Compressed file → Transmission / Storage → Decoder (decompression) → Reconstructed data.

Summary

Compression is essential for efficient storage and transmission. Understanding the trade‑off between lossless (exact reconstruction, modest ratios) and lossy (higher ratios, some quality loss) enables you to select the most suitable technique for each media type—text, bitmap, vector, sound, video or binary data. Familiarity with key algorithms (RLE, Huffman, LZW, arithmetic coding, JPEG, MP3, H.264/HEVC) and awareness of their impact on CPU, memory, latency and energy help you justify choices in exam questions and in real‑world system design.

Create an account or Login to take a Quiz

84 views
0 improvement suggestions

Log in to suggest improvements to this note.