Understand the purpose of and need for data compression

Data Storage, Compression & the Wider Digital Context – Cambridge IGCSE 0478 (Topic 1.3)

Learning Objectives

  • AO1: Explain the purpose of data compression and why it is required in modern computing.
  • AO2: Convert between binary, hexadecimal and decimal; represent signed numbers using two’s‑complement; calculate file sizes for text, image and sound data.
  • AO3: Describe the basic hardware and software components that manipulate data, the principles of data transmission, and the role of the Internet, security and digital currency.

1. Data Representation

1.1 Number systems

  • Binary (base‑2) – each digit is a bit (0 or 1).
  • Hexadecimal (base‑16) – digits 0‑9 and A‑F; useful for compactly writing binary values.
  • Conversion shortcuts:
    • Binary → Hex: group bits in fours (e.g., 1011 1100 = BC16).
    • Hex → Decimal: multiply each digit by 16ⁿ (e.g., 3A16 = 3·16¹ + 10·16⁰ = 5810).

1.2 Signed numbers – two’s‑complement

StepMethod (8‑bit example)
Positive valueWrite the binary magnitude (e.g., 13 = 00001101).
Negative valueInvert all bits and add 1 (‑13 → 11110010 + 1 = 11110011).
Range‑128 to +127 for 8‑bit.

1.3 Text, image & sound representation

  • Text – ASCII (7‑bit) or extended ASCII (8‑bit). One byte per character.
  • Images – colour depth = bits per pixel.
    • Monochrome: 1 bpp
    • 256‑colour: 8 bpp
    • True‑colour (24‑bit): 8 bits each for Red, Green, Blue.
  • Sound (PCM) – sample rate × bit depth × number of channels.
    • CD quality: 44.1 kHz, 16‑bit, stereo.

1.4 File‑size calculations (binary units)

All exam calculations use binary prefixes: 1 KiB = 1024 bytes, 1 MiB = 1024 KiB.

Example 1 – Uncompressed image

800 × 600 pixels, 24‑bit colour:

  1. Pixels = 800 × 600 = 480 000
  2. Total bits = 480 000 × 24 = 11 520 000 bits
  3. Bytes = 11 520 000 ÷ 8 = 1 440 000 B
  4. KiB = 1 440 000 ÷ 1024 ≈ 1 406 KiB (≈ 1.37 MiB)

Example 2 – Uncompressed sound

10 s stereo, 44.1 kHz, 16‑bit:

  1. Samples / s × channels × bits = 44 100 × 2 × 16 = 1 411 200 bits/s
  2. Total bits = 1 411 200 × 10 = 14 112 000 bits
  3. Bytes = 14 112 000 ÷ 8 = 1 764 000 B
  4. KiB = 1 764 000 ÷ 1024 ≈ 1 722 KiB (≈ 1.68 MiB)

2. Why Do We Compress Data?

  • Storage efficiency – frees up space on hard‑drives, SSDs, USB sticks, memory cards and mobile devices.
  • Transmission efficiency – fewer bits travel over a network, lowering bandwidth cost and reducing download/upload times.
  • Performance boost – web pages load faster, streaming buffers less, backup and restore operations complete quicker.
  • Device constraints – embedded systems, smartphones and IoT devices often have limited storage and memory.

Exam tip: When asked “why is compression needed?”, give at least one storage‑related reason and one transmission‑related reason for full marks.


3. Lossless vs. Lossy Compression

AspectLosslessLossy
Reconstruction Exact replica of the original data. Some original information is permanently discarded; result is an approximation.
Typical uses Text, source code, executables, databases, archival records. Photographs, audio, video, web graphics where a small quality loss is acceptable.
Typical compression ratio 2 : 1 – 3 : 1 (higher for highly repetitive data). 10 : 1 or more (e.g., JPEG, MP3, MPEG).

3.1 Core syllabus methods

Run‑Length Encoding (RLE) – Lossless
Replaces a run of identical symbols with a single symbol and a count. Example: AAAAAA5.
Discrete Cosine Transform (DCT) – Lossy
Used in JPEG images and MPEG video. Converts blocks of pixels to the frequency domain and discards high‑frequency components that are less noticeable to the human eye.

3.2 Optional/advanced algorithms

AlgorithmCategoryKey ideaTypical use
Huffman codingLosslessShorter binary codes for more frequent symbolsText files, PNG images
Lempel‑Ziv‑Welch (LZW)LosslessDictionary of repeated patterns built on‑the‑flyGIF, ZIP archives
Transform coding (e.g., MP3)LossyConvert audio to frequency components and remove those below hearing thresholdsAudio files

4. Measuring Compression Effectiveness

All exam work uses binary‑based units (KiB, MiB).

Compression Ratio

\[ \text{Compression Ratio} = \frac{\text{Original size}}{\text{Compressed size}} \]

Percentage Space Saving

\[ \text{Space Saving (\%)} = \left(1-\frac{\text{Compressed size}}{\text{Original size}}\right)\times100 \]

Exam tip: No calculators are allowed in the written paper. Perform division and subtraction by hand and round only in the final answer if a whole‑number percentage is required.

4.1 Example (Lossless)

File before compression: 500 KiB; after RLE: 200 KiB.

  • Compression ratio = 500 ÷ 200 = 2.5 : 1
  • Space saving = (1 − 200 ÷ 500) × 100 = 60 %

5. Data Transmission Basics (AO3)

5.1 Packets and protocols

  • Packet structure – header (source/destination address, length, checksum) + payload (actual data).
  • Common protocols – TCP (reliable, ordered delivery) and UDP (fast, no guarantee).

5.2 USB & other physical links

  • USB 2.0: up to 480 Mbit s⁻¹; USB‑C adds power delivery and reversible connector.
  • Other links: Ethernet (10 Mb/s – 10 Gb/s), Wi‑Fi (802.11ac/ax), Bluetooth (low‑energy).

5.3 Error detection & correction

MethodHow it worksTypical use
Parity bitAdds a single bit to make the number of 1’s even (or odd).Simple serial links.
ChecksumSum of all bytes; receiver recomputes and compares.IP, TCP.
CRC (Cyclic Redundancy Check)Polynomial division produces a remainder; very good at detecting burst errors.Ethernet, storage devices.
ARQ (Automatic Repeat Request)Receiver asks sender to resend corrupted packets.TCP.

5.4 Basic encryption (AO3)

  • Symmetric (e.g., AES) – same key for encryption and decryption; fast, used for bulk data.
  • Asymmetric (e.g., RSA) – public key encrypts, private key decrypts; enables secure key exchange and digital signatures.
  • In practice, HTTPS combines both: asymmetric exchange of a symmetric session key, then fast symmetric encryption of the data.

6. Hardware Overview (AO3)

  • Von Neumann architecture – CPU, memory, I/O, and a bus connecting them.
  • Fetch‑Decode‑Execute (FDE) cycle – the CPU repeatedly fetches an instruction, decodes it, and executes it.
  • CPU core & cache – multiple cores allow parallel execution; cache (L1, L2, L3) stores recently used data to speed up access.
  • Instruction set – the set of binary commands a CPU can perform (e.g., ARM, x86).
  • Embedded systems – specialised computers with limited resources (e.g., micro‑controllers in appliances, IoT devices).

7. Software Overview (AO3)

7.1 System vs. application software

System softwareApplication software
Operating system (Windows, macOS, Linux); device drivers; utility programs. Word processors, web browsers, games, media players, database applications.

7.2 OS functions relevant to compression

  • File‑system management – stores compressed files, maintains metadata.
  • Memory management – allocates RAM for compression algorithms.
  • Process scheduling – allows compression utilities to run alongside other tasks.

7.3 Interrupts

Hardware signals that temporarily halt the CPU’s current instruction stream to service a higher‑priority event (e.g., I/O completion, timer). After handling, the CPU resumes the interrupted task.

7.4 Language levels & development tools

  • Machine language – binary instructions executed directly by the CPU.
  • Assembly language – mnemonic representation of machine instructions; one‑to‑one mapping.
  • High‑level languages – C, Java, Python; easier for humans, require translation.
  • Compilers vs. interpreters
    • Compiler translates source code to machine code before execution (e.g., C → .exe).
    • Interpreter executes source line‑by‑line at run‑time (e.g., Python).
  • IDE (Integrated Development Environment) – combines editor, compiler/interpreter, debugger, and project management (e.g., Eclipse, Visual Studio).

8. The Internet & Its Uses (AO3)

8.1 WWW vs. Internet

  • Internet – global network of interconnected computers using TCP/IP.
  • World Wide Web (WWW) – a service on the Internet that uses HTTP/HTTPS to exchange hyper‑text documents.

8.2 URL structure

Example: https://www.example.com:443/path/page.html?search=cat#section2

  • Protocol: https
  • Host name: www.example.com
  • Port (optional): 443
  • Path: /path/page.html
  • Query string: ?search=cat
  • Fragment identifier: #section2

8.3 HTTP vs. HTTPS

  • HTTP – plain text; vulnerable to eavesdropping.
  • HTTPS – HTTP over TLS/SSL; encrypts data between client and server.

8.4 Browsers, cookies & DNS

  • Browser – client software that renders HTML, CSS, JavaScript.
  • Cookies – small text files stored by the browser to maintain state (e.g., login sessions).
  • DNS (Domain Name System) – translates human‑readable domain names into IP addresses.

9. Digital Currency & Blockchain Basics (AO3)

  • Blockchain – a distributed, append‑only ledger where each block contains a list of transactions and a cryptographic hash of the previous block.
  • Hash functions (e.g., SHA‑256) produce a fixed‑length output that is infeasible to reverse, ensuring tamper‑proof records.
  • Consensus mechanisms (Proof‑of‑Work, Proof‑of‑Stake) allow participants to agree on the next valid block without a central authority.
  • In the IGCSE context, focus on the idea that a blockchain provides security and integrity for digital money such as Bitcoin.

10. Cyber‑Security Essentials (AO3)

  • Threats
    • Malware – viruses, worms, ransomware.
    • Phishing – deceptive emails/websites that steal credentials.
    • Denial‑of‑Service (DoS) – overloads a service to make it unavailable.
    • Unauthorised access – hacking, insider threats.
  • Defences
    • Firewalls – filter inbound/outbound traffic.
    • Antivirus/anti‑malware – detect and quarantine malicious code.
    • Encryption – protects data at rest and in transit.
    • Authentication – passwords, biometrics, two‑factor authentication.
  • Good practice for exams: be able to name a threat and a corresponding mitigation.

11. Choosing the Right Compression Method

  • Data integrity required? – Use lossless (e.g., RLE, Huffman, LZW) for code, financial records, medical images.
  • Quality loss acceptable? – Use lossy (DCT, MP3) for photographs, music, video where human perception masks the loss.
  • Storage vs. quality trade‑off – Estimate the needed space saving and compare with the visible/audible impact of loss.

12. Summary Checklist (Exam Revision)

  • Explain why compression is needed – give one storage and one transmission reason.
  • Convert between binary, decimal and hexadecimal; represent signed numbers with two’s‑complement.
  • Calculate uncompressed image and sound file sizes using pixel count, colour depth, sample rate, bit depth and channels.
  • Define lossless vs. lossy; state typical ratios and give real‑world examples.
  • Describe RLE and DCT processes – the only methods required by the syllabus.
  • Compute compression ratio and percentage space saving without a calculator.
  • Outline basic packet structure, error‑checking methods, and the role of encryption in data transmission.
  • Identify key hardware components (CPU, cache, cores) and software layers (OS, application, compilers).
  • Summarise the Internet stack: URL parts, HTTP/HTTPS, DNS, cookies.
  • Give a concise definition of blockchain and name one cyber‑security threat with its mitigation.
Suggested diagram: Flowchart contrasting the steps in lossless (RLE) and lossy (DCT) compression, linked to the FDE cycle of the CPU.

Create an account or Login to take a Quiz

44 views
0 improvement suggestions

Log in to suggest improvements to this note.