Cambridge Notes, Past Papers, Revision Questions

1.1 Data Representation – Character Data

Learning objective (Cambridge AS & A‑Level)

Show understanding of, and be able to represent, character data in its internal binary form, depending on the character set used.

Why character representation matters

All information in a computer is stored as bits (0 or 1).

To read, write or transmit text we need a systematic mapping between the characters we see and the binary patterns the machine stores.

This mapping is defined by a character set (the collection of symbols) and an encoding scheme (the rule that turns each symbol into a binary pattern).

Key terminology

Term	Definition
Character set (code set)	The complete list of symbols that may be represented (letters, digits, punctuation, emojis, etc.).
Code point	The numeric value assigned to a character within a character set.
Encoding	The method that translates a code point into a binary pattern for storage or transmission.
ASCII	7‑bit character set for basic English text (128 code points, 0–127).
Unicode	Universal character set covering virtually every written language, symbols and emojis.
UTF‑8, UTF‑16, UTF‑32	Common Unicode encodings that differ in the number of bytes used per code point.

Binary prefixes vs. decimal prefixes (syllabus requirement)

Memory‑size specifications often use decimal prefixes (kilo, mega, giga) but the computer hardware works in binary multiples. The syllabus distinguishes them as follows:

Prefix	Symbol	Decimal value	Binary value (2ⁿ)
kilo	k	10³ = 1 000	2¹⁰ = 1 024 (Ki)
mega	M	10⁶ = 1 000 000	2²⁰ = 1 048 576 (Mi)
giga	G	10⁹ = 1 000 000 000	2³⁰ = 1 073 741 824 (Gi)

Example: 2 MiB = 2 × 2²⁰ bytes = 2 097 152 bytes, whereas 2 MB = 2 × 10⁶ bytes = 2 000 000 bytes.

Number systems used in the syllabus

System	Base	Typical use	Example conversion (45)
Binary	2	Low‑level data, logic circuits	45₁₀ = 0010 1101₂
Octal	8	Legacy Unix permissions, some low‑level debugging	45₁₀ = 55₈
Decimal	10	Human‑readable numbers	45₁₀ = 45₁₀
Hexadecimal	16	Memory addresses, colour codes, binary‑hex shortcuts	45₁₀ = 2D₁₆
BCD (Binary‑Coded Decimal)	–	Financial & embedded systems	45₁₀ = 0100 0101₂
One’s complement	–	Historical signed‑number representation	–5 → invert 0000 0101 → 1111 1010
Two’s complement	–	Standard signed‑number representation	–5 → invert 0000 0101 → 1111 1010 + 1 = 1111 1011

Binary arithmetic (addition, subtraction, overflow)

Addition example (4‑bit)


1011₂
+ 0110₂
--------
10001₂   (5‑bit result)

If the result must fit in 4 bits, the left‑most carry is discarded → 0001₂. The discarded carry sets the overflow flag.

Subtraction example (4‑bit, using two’s complement)


0110₂   (6)
– 0011₂   (3)
--------
0011₂   (3)

To subtract, invert the subtrahend and add 1 (two’s complement). If a borrow is needed the hardware sets the borrow flag.

Signed number representations – quick checklist

Is the question about representing a negative integer? → use one’s or two’s complement.

Does the exam ask for the binary pattern that a processor would store? → use two’s complement (the standard in modern CPUs).

For a quick sanity check:
- Range of an n‑bit two’s‑complement number: –2ⁿ⁻¹ … 2ⁿ⁻¹ – 1.
- Positive numbers have the same binary as unsigned; negative numbers start with a leading 1.

ASCII – the foundation (7‑bit)

ASCII defines 128 characters (code points 0–127). Each character is stored in an 8‑bit byte; the most‑significant bit is always 0.

Decimal	Hex	7‑bit binary	Character
0	00	000 0000	NUL (null)
9	09	001 0001	HT (tab)
10	0A	001 0100	LF (line‑feed)
13	0D	001 1010	CR (carriage‑return)
48	30	011 0000	0
49	31	011 0001	1
57	39	011 1001	9
65	41	100 0001	A
66	42	100 0010	B
90	5A	101 1010	Z
97	61	110 0001	a
98	62	110 0010	b
122	7A	111 1010	z
32	20	010 0000	Space
33	21	010 0001	!
126	7E	111 1110	~

Example – the word “Hi!” in ASCII

H → 72₁₀ = 01001000₂

i → 105₁₀ = 01101001₂

! → 33₁₀ = 00100001₂

Stored as the three‑byte sequence 01001000 01101001 00100001.

Extended ASCII (8‑bit)

Many languages need more than 128 symbols. Various 8‑bit extensions fill the range 128–255:

ISO‑8859‑1 (Latin‑1) – Western European languages.

Windows‑1252 – Similar to ISO‑8859‑1 but with extra printable characters.

ISO‑8859‑5 – Cyrillic script.

Because the same byte can represent different characters on different systems, these extensions are not universal – a key reason for the adoption of Unicode.

Unicode – a global standard

Unicode assigns a unique code point to every character, symbol, emoji and control code. Code points are written as U+XXXX (hexadecimal).

Character	Unicode name	Code point	UTF‑8 (hex)	UTF‑16 (hex)
A	LATIN CAPITAL LETTER A	U+0041	41	0041
Ω	GREEK CAPITAL LETTER OMEGA	U+03A9	CE A9	03A9
अ	DEVANAGARI LETTER A	U+0905	E0 A4 85	0905
😀	GRINNING FACE	U+1F600	F0 9F 98 80	D83D DE00

Unicode encoding schemes

UTF‑8 – Variable length, 1‑4 bytes.
- 1 byte: 0xxxxxxx (identical to ASCII for 0–127).
- 2 bytes: 110xxxxx 10xxxxxx
- 3 bytes: 1110xxxx 10xxxxxx 10xxxxxx
- 4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

UTF‑16 – Uses 2 bytes for the Basic Multilingual Plane (U+0000–U+FFFF). Code points above U+FFFF are encoded as a *surrogate pair* (4 bytes).

UTF‑32 – Fixed 4‑byte representation for every code point (simple but memory‑inefficient).

Step‑by‑step conversion – character to binary

Example 1: ASCII character ‘#’ (U+0023)

Code point: 35₁₀ = 0x23.

ASCII is 7‑bit, so binary = 0100011. Stored in an 8‑bit byte as 0010 0011.

Example 2: Unicode character ‘é’ (U+00E9) in UTF‑8

Code point: U+00E9 = 233₁₀ = 1110 1001₂.

233 > 127 → use the 2‑byte UTF‑8 pattern 110xxxxx 10xxxxxx.

Split the 8 bits: first 5 bits 11101, remaining 3 bits 001 (padded to 6 bits → 000001).

Insert:
- First byte: 11011101 = 0xDD.
- Second byte: 10100001 = 0xA1.

Resulting UTF‑8 byte sequence: DD A1 (binary 11011101 10100001).

Conversion tip – binary ↔ hexadecimal ↔ decimal

Group binary digits in fours (starting from the right) → each group = one hex digit.

Convert each hex digit to its decimal value (0–15). Example: 1010 1101₂ → AD₁₆ → 173₁₀.

Useful powers:
- 2⁸ = 256 (one byte)
- 2¹⁶ = 65 536 (two bytes)
- 2²⁴ = 16 777 216 (three bytes)

Practical tips for exams (Cambridge)

Memorise the ASCII range 32–126 (printable characters) and the binary patterns for the first 128 code points.

Know the leading‑byte patterns for UTF‑8 (1‑ to 4‑byte forms) – they are a frequent mark‑scheme point.

When converting to UTF‑8:
1. Start from the Unicode code point (hex is easiest).
2. Choose the correct byte‑length pattern (based on the value).
3. Split the binary representation into the required groups and fill the pattern.

For UTF‑16 remember that the code point itself does not change; only the byte order (big‑endian vs. little‑endian) may differ.

Practice binary addition, subtraction and recognise overflow/borrow – the exam often asks you to state whether the flag would be set.

Be comfortable converting between binary, octal, decimal and hexadecimal; the “group‑by‑four” rule makes hex conversion rapid, and “group‑by‑three” for octal.

Use the quick checklist for one’s‑ and two’s‑complement questions to avoid sign errors.

Sample exam questions (aligned with syllabus)

Give the 8‑bit binary representation of the ASCII character #.

Encode the Unicode character Ω (U+03A9) in UTF‑8 and show the binary result.

Explain why the string “Café” cannot be stored using only 7‑bit ASCII.

Convert the UTF‑16 big‑endian byte pair 0xD8 0x3D 0xDE 0x00 to the corresponding Unicode character (show the code point).

Perform the 8‑bit two’s‑complement addition 0101 0011₂ + 1010 1101₂ and indicate whether overflow occurs.

Express the decimal number 45 in BCD and in plain binary.

Convert the octal number 73₈ to binary and hexadecimal.

Subtract 0011 0101₂ from 1010 1100₂** using two’s complement and state the borrow flag.

Suggested diagram: Flowchart –> Character → Unicode code point → chosen encoding (ASCII / UTF‑8 / UTF‑16 / UTF‑32) → binary representation (grouped by bytes).

Show understanding of and be able to represent character data in its internal binary form, depending on the character set used

1.1 Data Representation – Character Data

Learning objective (Cambridge AS & A‑Level)