Show understanding of the need for and examples of the use of compression.
Why is Compression Needed?
Storage limitations – data must fit on disks, SSDs, or removable media.
Transmission efficiency – reduces bandwidth usage and speeds up network communication.
Cost reduction – less storage and lower transmission costs.
Performance – smaller files can be loaded and processed more quickly.
Types of Compression
Lossless compression: original data can be perfectly reconstructed.
Lossy compression: some data is permanently discarded to achieve higher compression ratios.
Common Lossless Techniques
Run‑Length Encoding (RLE)
Huffman Coding
Lempel‑Ziv‑Welch (LZW)
Common Lossy Techniques
Transform coding (e.g., JPEG for images)
Perceptual coding (e.g., MP3 for audio)
Predictive coding (e.g., MPEG for video)
Examples of Compression in Everyday Use
Text files – ZIP, GZIP, BZIP2
Images – PNG (lossless), JPEG (lossy)
Audio – FLAC (lossless), MP3/AAC (lossy)
Video – H.264/A \cdot C, H.265/HE \cdot C (lossy)
Web pages – Brotli, GZIP for HTTP transfer
Illustrative Comparison of Lossless vs. Lossy Compression
Aspect
Lossless
Lossy
Data integrity
Exact original data can be recovered
Original data is approximated; some information is lost
Typical compression ratio
2:1 – 3:1 (occasionally up to 5:1)
10:1 – 100:1 depending on content and quality settings
Common applications
Source code, text documents, executable files, archival storage
Photographs, music, streaming video, web graphics
Algorithm examples
Huffman, LZW, DEFLATE
JPEG, MP3, H.264
Run‑Length Encoding (RLE) – Simple Example
Consider the binary string \$1011111100\$. Using RLE we store it as:
\$\text{RLE} = (1,1), (0,1), (1,6), (0,2)\$
Each pair represents the bit value and the number of consecutive occurrences.
Huffman Coding – Principle
Huffman coding assigns shorter codes to more frequent symbols. For the text “ABRACADABRA” with symbol frequencies:
Symbol
Frequency
A
5
B
2
R
2
C
1
D
1
Resulting Huffman codes might be:
A → 0
B → 110
R → 111
C → 1010
D → 1011
Impact of Compression on System Design
CPU overhead – encoding/decoding requires processing time.
Memory usage – buffers may be needed for both compressed and decompressed data.
Latency – real‑time applications must balance compression ratio against delay.
Suggested diagram: Flowchart showing the stages of compression and decompression, including input data, encoder, compressed output, transmission/storage, decoder, and reconstructed data.
Summary
Compression is essential for efficient use of storage and bandwidth. Understanding when to use lossless versus lossy techniques, and being familiar with common algorithms such as RLE, Huffman, and LZW, enables developers to make informed decisions that balance data fidelity, performance, and resource constraints.