Show understanding of, and be able to represent, character data in its internal binary form, depending on the character set used.
| Term | Definition |
|---|---|
| Character set (code set) | The complete list of symbols that may be represented (letters, digits, punctuation, emojis, etc.). |
| Code point | The numeric value assigned to a character within a character set. |
| Encoding | The method that translates a code point into a binary pattern for storage or transmission. |
| ASCII | 7‑bit character set for basic English text (128 code points, 0–127). |
| Unicode | Universal character set covering virtually every written language, symbols and emojis. |
| UTF‑8, UTF‑16, UTF‑32 | Common Unicode encodings that differ in the number of bytes used per code point. |
Memory‑size specifications often use decimal prefixes (kilo, mega, giga) but the computer hardware works in binary multiples. The syllabus distinguishes them as follows:
| Prefix | Symbol | Decimal value | Binary value (2ⁿ) |
|---|---|---|---|
| kilo | k | 10³ = 1 000 | 2¹⁰ = 1 024 (Ki) |
| mega | M | 10⁶ = 1 000 000 | 2²⁰ = 1 048 576 (Mi) |
| giga | G | 10⁹ = 1 000 000 000 | 2³⁰ = 1 073 741 824 (Gi) |
Example: 2 MiB = 2 × 2²⁰ bytes = 2 097 152 bytes, whereas 2 MB = 2 × 10⁶ bytes = 2 000 000 bytes.
| System | Base | Typical use | Example conversion (45) |
|---|---|---|---|
| Binary | 2 | Low‑level data, logic circuits | 45₁₀ = 0010 1101₂ |
| Octal | 8 | Legacy Unix permissions, some low‑level debugging | 45₁₀ = 55₈ |
| Decimal | 10 | Human‑readable numbers | 45₁₀ = 45₁₀ |
| Hexadecimal | 16 | Memory addresses, colour codes, binary‑hex shortcuts | 45₁₀ = 2D₁₆ |
| BCD (Binary‑Coded Decimal) | – | Financial & embedded systems | 45₁₀ = 0100 0101₂ |
| One’s complement | – | Historical signed‑number representation | –5 → invert 0000 0101 → 1111 1010 |
| Two’s complement | – | Standard signed‑number representation | –5 → invert 0000 0101 → 1111 1010 + 1 = 1111 1011 |
Addition example (4‑bit)
1011₂
+ 0110₂
--------
10001₂ (5‑bit result)
If the result must fit in 4 bits, the left‑most carry is discarded → 0001₂. The discarded carry sets the overflow flag.
Subtraction example (4‑bit, using two’s complement)
0110₂ (6)
– 0011₂ (3)
--------
0011₂ (3)
To subtract, invert the subtrahend and add 1 (two’s complement). If a borrow is needed the hardware sets the borrow flag.
ASCII defines 128 characters (code points 0–127). Each character is stored in an 8‑bit byte; the most‑significant bit is always 0.
| Decimal | Hex | 7‑bit binary | Character |
|---|---|---|---|
| 0 | 00 | 000 0000 | NUL (null) |
| 9 | 09 | 001 0001 | HT (tab) |
| 10 | 0A | 001 0100 | LF (line‑feed) |
| 13 | 0D | 001 1010 | CR (carriage‑return) |
| 48 | 30 | 011 0000 | 0 |
| 49 | 31 | 011 0001 | 1 |
| 57 | 39 | 011 1001 | 9 |
| 65 | 41 | 100 0001 | A |
| 66 | 42 | 100 0010 | B |
| 90 | 5A | 101 1010 | Z |
| 97 | 61 | 110 0001 | a |
| 98 | 62 | 110 0010 | b |
| 122 | 7A | 111 1010 | z |
| 32 | 20 | 010 0000 | Space |
| 33 | 21 | 010 0001 | ! |
| 126 | 7E | 111 1110 | ~ |
Example – the word “Hi!” in ASCII
Stored as the three‑byte sequence 01001000 01101001 00100001.
Many languages need more than 128 symbols. Various 8‑bit extensions fill the range 128–255:
Because the same byte can represent different characters on different systems, these extensions are not universal – a key reason for the adoption of Unicode.
Unicode assigns a unique code point to every character, symbol, emoji and control code. Code points are written as U+XXXX (hexadecimal).
| Character | Unicode name | Code point | UTF‑8 (hex) | UTF‑16 (hex) |
|---|---|---|---|---|
| A | LATIN CAPITAL LETTER A | U+0041 | 41 | 0041 |
| Ω | GREEK CAPITAL LETTER OMEGA | U+03A9 | CE A9 | 03A9 |
| अ | DEVANAGARI LETTER A | U+0905 | E0 A4 85 | 0905 |
| 😀 | GRINNING FACE | U+1F600 | F0 9F 98 80 | D83D DE00 |
0xxxxxxx (identical to ASCII for 0–127).110xxxxx 10xxxxxx1110xxxx 10xxxxxx 10xxxxxx11110xxx 10xxxxxx 10xxxxxx 10xxxxxx0100011. Stored in an 8‑bit byte as 0010 0011.U+00E9 = 233₁₀ = 1110 1001₂.110xxxxx 10xxxxxx.11101, remaining 3 bits 001 (padded to 6 bits → 000001).11011101 = 0xDD.10100001 = 0xA1.DD A1 (binary 11011101 10100001).1010 1101₂ → AD₁₆ → 173₁₀.#.Ω (U+03A9) in UTF‑8 and show the binary result.0xD8 0x3D 0xDE 0x00 to the corresponding Unicode character (show the code point).0101 0011₂ + 1010 1101₂ and indicate whether overflow occurs.73₈ to binary and hexadecimal.0011 0101₂ from 1010 1100₂** using two’s complement and state the borrow flag.Your generous donation helps us continue providing free Cambridge IGCSE & A-Level resources, past papers, syllabus notes, revision questions, and high-quality online tutoring to students across Kenya.