Show understanding of lossy and lossless compression and justify the use of a method in a given situation

1.3 Compression

Learning objective

Show understanding of lossy and lossless compression and justify the choice of a method for a given situation.

Why compress data?

  • Storage limits – devices have finite capacity (e.g., a 64 GB SD card holds many more photos when they are compressed).
  • Bandwidth constraints – network links can transmit only a limited number of bits per second.
  • Cost of transmission – many services charge per megabyte of data sent.
  • Battery life – moving fewer bits consumes less power on mobile devices.

Quantitative example: a 100 MB uncompressed video (≈ 800 Mbit) streamed over a 2 Mbps link would need about 7 minutes. The same video encoded with H.264 at 5 Mbps needs only ≈ 2.7 minutes, saving both time and data.

What is compression?

Compression reduces the number of bits required to represent information by exploiting redundancy or the limits of human perception. The result is a smaller compressed representation that can be stored or transmitted more efficiently.

Types of compression

  • Lossless compression – the original data can be reconstructed exactly.
  • Lossy compression – some information is permanently discarded; the reconstructed data is an approximation.

Lossless compression

Used when exact fidelity is essential (e.g., source code, legal documents, medical images, archival audio).

Common techniques

  1. Run‑Length Encoding (RLE) – replaces consecutive identical symbols with a count.

    Example: AAAAABBBCCDAA5A3B2C1D2A.

  2. Huffman coding – variable‑length prefix codes based on symbol frequencies.

    Entropy formula: \(H = -\sum{i=1}^{n} pi \log2 pi\).

    A Huffman tree gives the optimal average code length for the given probabilities.

  3. Lempel‑Ziv‑Welch (LZW) – builds a dictionary of repeated substrings during encoding.
  4. DEFLATE (ZIP/GZIP) – combines LZ77 sliding‑window compression with Huffman coding.

Media‑type mapping (lossless)

TechniqueTypical media / use‑case
RLESimple bitmap images, fax transmission, monochrome icons
HuffmanPNG image format, DEFLATE streams, JPEG‑2000 lossless mode
LZWGIF images, early Unix compress, PDF internal streams
DEFLATE (ZIP/GZIP)Text files, source code archives, generic file bundles (ZIP, GZIP)

Text compression example

Most text files are archived with ZIP/DEFLATE. Repetitive words and spaces are replaced by dictionary references, achieving typical ratios of 2 : 1 to 3 : 1.

Vector‑graphic compression

Vector graphics (e.g., SVG) already describe images mathematically, so they are inherently lossless. Compression focuses on reducing file‑size overhead:

  • Remove unnecessary whitespace, comments, and metadata.
  • Shorten attribute names (e.g., stroke-widthsw) where possible.
  • Apply a generic lossless compressor such as DEFLATE (the .svgz format).

Concrete example:

  1. Original example.svg (plain XML) = 45 KB.
  2. Run svgo to strip whitespace and unused definitions → 38 KB.
  3. Compress with DEFLATE → example.svgz = 12 KB (≈ 73 % reduction).

Lossy techniques are rarely used because the visual fidelity of a vector image is defined by its mathematical description, not by pixel data.

Lossy compression

Used when a perfect replica is not required and higher compression ratios are desirable (e.g., photographs, audio, video streaming).

Typical steps – JPEG (image)

  1. Convert RGB to YCbCr (separates luminance from chrominance).
  2. Divide the image into 8 × 8 blocks and apply the Discrete Cosine Transform (DCT).
  3. Quantise the DCT coefficients using a quantisation matrix – high‑frequency coefficients are rounded to zero.
  4. Encode the remaining coefficients with run‑length and Huffman coding.

Audio compression examples

  • MP3 – perceptual coding discards frequencies outside human hearing.
  • FLAC – lossless audio format; useful for archival copies where no quality loss is acceptable.

Video compression examples

  • H.264 / MPEG‑4 AVC – intra‑frame DCT, inter‑frame motion compensation, and entropy coding.
  • HEVC (H.265) – roughly double the compression efficiency of H.264.

Media‑type mapping (lossy)

TechniqueTypical media / use‑case
JPEGPhotographs, web images
PNG (lossless mode)Line art, screenshots, images requiring transparency
MP3 / AACMusic streaming, podcasts
FLACArchival audio, high‑resolution music collections
H.264 / H.265Online video, Blu‑ray, video conferencing

Comparison of lossless and lossy methods

AspectLosslessLossy
Data integrityExact reconstructionApproximate reconstruction
Typical applicationsText, source code, archives, medical imaging, archival audio (FLAC)Photographs, streaming audio/video, web images (JPEG, MP3, H.264)
Common algorithmsRLE, Huffman, LZW, DEFLATE, PNG, JPEG‑2000 (lossless mode)JPEG, MP3, AAC, H.264, HEVC
Typical compression ratio2 : 1 – 3 : 1 (up to ~5 : 1 for highly redundant data)10 : 1 – 100 : 1 or higher
Impact on qualityNo loss of qualityQuality degrades as compression increases; artefacts may become visible.

Decision‑matrix checklist (exam‑friendly)

When asked to justify a compression method, tick the criteria that apply and then choose the algorithm that best satisfies them.

CriterionLossless needed?Lossy acceptable?
Exact data fidelity required (e.g., legal text, medical diagnosis)
Very limited storage or bandwidth✗ (or moderate)✓ (higher ratios)
Processing power limited (e.g., embedded device)✓ (simple RLE, Huffman)✗ (complex transforms may be too heavy)
Human perception can hide artefacts (photos, audio, video)

Case studies – justification in practice

Case study 1: Archiving legal documents

  • Data type: plain text.
  • Key criteria: exact fidelity, moderate storage saving, low processing overhead.
  • Chosen method: ZIP (DEFLATE) – lossless, 2 : 1 – 3 : 1 ratio, fast encode/decode.

Case study 2: Streaming a live sports event

  • Data type: high‑definition video.
  • Key criteria: very limited bandwidth, low latency, viewers accept minor artefacts.
  • Chosen method: H.264 (MPEG‑4 AVC) – lossy, 20 : 1 – 50 : 1 ratio, efficient hardware decoding.

Case study 3: Storing MRI scans for diagnosis

  • Data type: grayscale medical images.
  • Key criteria: diagnostic accuracy (pixel‑perfect), moderate storage reduction.
  • Chosen method: JPEG‑2000 in lossless mode – lossless, up to 3 : 1 ratio, supports very large images.

Key points to remember

  • Lossless = no data loss; essential for text, code, critical images, archival audio (FLAC).
  • Lossy = data loss; suitable for media where human perception can tolerate approximations.
  • Match the method to the purpose (fidelity vs. size), the environment (bandwidth, storage, processing), and the acceptable quality level.
  • Use the decision‑matrix checklist in exams to structure a clear justification.

Exam‑style prompt (for practice)

“Explain why lossless compression is required for archiving legal documents and suggest a suitable algorithm. Justify your choice with reference to data integrity, typical compression ratio, and processing requirements.”