Normalise floating-point numbers

13.3 Floating‑Point Numbers – Representation, Normalisation & Arithmetic

1. Quick Reference – Syllabus Coverage Matrix

Topic (Paper 3)Notes Included?Assessment Objective(s)
Information representation (binary, octal, hexadecimal, BCD, ASCII/Unicode)✓ (new section)AO1 – recall formats; AO2 – convert between bases
Hardware basics – CPU, registers, logic gates, Boolean algebra✓ (new section)AO1, AO2
Communication basics – LAN/WAN, topologies, protocols, IP, DNS✓ (new section)AO1, AO2
Floating‑point representation (IEEE‑754 single precision)AO1 – recall format; AO2 – analyse precision limits
Normalisation of binary numbers✓ (expanded)AO1, AO2
Rounding modes & precision lossAO2, AO3 – design code that selects a rounding mode
Arithmetic operations (add, subtract, multiply, divide)✓ (step‑by‑step)AO2, AO3 – implement normalisation after an operation
Special values (zero, denormals, infinities, NaN)AO1, AO2
Overflow & underflow handlingAO2, AO3 – devise a safe‑guard routine

2. Information Representation (AS 1)

  • Binary, Octal, Hexadecimal – base‑2, base‑8 and base‑16 systems used for low‑level programming.
  • BCD (Binary‑Coded Decimal) – each decimal digit stored in a 4‑bit nibble; useful for financial calculations.
  • ASCII & Unicode

    • ASCII: 7‑bit code for 128 characters (0‑127).
    • Unicode (UTF‑8): variable‑length encoding covering world‑wide characters; the first 128 code points are identical to ASCII.

  • Overflow in integer arithmetic – occurs when the result exceeds the range representable by the chosen number of bits (e.g., 8‑bit two’s complement range –128 … +127).

3. Basic Hardware (AS 2)

  • CPU components – ALU, control unit, registers, cache.
  • Memory hierarchy – registers → L1/L2 cache → main RAM → secondary storage.
  • Logic gates & Boolean algebra

    GateSymbolTruth Table
    NOT¬0→1, 1→0
    AND1 only when both inputs 1
    OR1 when any input 1
    XOR1 when inputs differ

  • Registers – small, fast storage inside the CPU; typical sizes: 8, 16, 32, 64 bits.

4. Communication Fundamentals (AS 3)

  • Network topologies – star, bus, ring, mesh; choice affects reliability and cost.
  • LAN vs WAN – Local Area Network (confined to a building or campus) vs Wide Area Network (covers large geographic area).
  • Key protocols

    • TCP/IP – reliable, connection‑oriented transport.
    • UDP – connection‑less, low‑latency.
    • HTTP/HTTPS – web traffic.

  • IP addressing

    • IPv4: 32‑bit address, dotted‑decimal notation (e.g., 192.168.1.10).
    • Subnet mask determines network & host portions.
    • IPv6: 128‑bit address, hexadecimal groups (e.g., 2001:0db8::1).

  • DNS (Domain Name System) – translates human‑readable domain names to IP addresses.

5. Binary Floating‑Point Word (IEEE‑754 Single Precision)

FieldBitsStored ValuePurpose
Sign (S)10 = positive, 1 = negativeIndicates the sign of the number
Exponent (E)8Unsigned integer with bias = 127Encodes the power of two
Fraction / Mantissa (F)23Bits after the leading 1 (the “hidden bit”)Provides the significant digits

The numerical value is

\[

\text{value}=(-1)^{S}\times 1.F \times 2^{E-\text{bias}}

\]

where 1.F means the hidden leading 1 followed by the fraction bits.

6. Why Normalise?

  • Creates a unique representation for every non‑zero value.
  • Maximises the number of significant bits that can be stored.
  • Allows the exponent to shift the binary point, giving a very wide dynamic range.
  • Essential for arithmetic – exponents must be aligned before mantissas can be added or subtracted.

7. Step‑by‑Step Normalisation (Binary)

  1. Convert the decimal number to binary. Separate integer and fractional parts.
  2. Locate the first ‘1’ to the left of the binary point. This determines the direction of the shift.
  3. Shift the binary point** so that exactly one ‘1’ appears to its left, giving the form 1.xxx… × 2k.
  4. Determine the exponent k** – the number of positions the point moved (left = positive, right = negative).
  5. Apply the bias:** storedExponent = k + bias (bias = 127 for single precision).
  6. Fill the fraction field** with the bits after the leading ‘1’. Pad with zeros on the right if fewer than 23 bits.
  7. Set the sign bit** according to the original number’s sign.

8. Worked Example – Positive Decimal

Normalise 13.625 (single precision, bias = 127).

  1. Binary: 1101.101.
  2. Shift three places left → 1.101101 × 23.
  3. Exponent k = 3 → stored exponent = 3 + 127 = 130 → 10000010.
  4. Fraction bits = 101101 followed by 17 zeros → 10110100000000000000000.
  5. Sign = 0.

Resulting 32‑bit pattern:

0 10000010 10110100000000000000000

9. Worked Example – Negative Decimal (< 1)

Normalise -0.15625.

  1. Binary of 0.15625: 0.00101.
  2. Shift right three places → 1.01 × 2-3.
  3. Exponent k = ‑3 → stored exponent = -3 + 127 = 124 → 01111100.
  4. Fraction bits = 010 followed by 20 zeros → 01000000000000000000000.
  5. Sign = 1.

Resulting 32‑bit pattern:

1 01111100 01000000000000000000000

10. Rounding Modes (IEEE‑754)

  • Round‑to‑nearest, ties‑to‑even – default; minimises overall error.
  • Round‑toward‑zero – truncates extra bits.
  • Round‑down (toward ‑∞) and Round‑up (toward +∞) – useful for interval arithmetic.

During normalisation, any bits that fall off the 23‑bit fraction are examined to decide whether the retained mantissa must be incremented (round‑up) or left unchanged (round‑down).

11. Normalising the Result of an Arithmetic Operation

Example: add A = 1.101 × 2³ and B = 1.001 × 2¹.

  1. Align exponents: shift the mantissa of the smaller exponent (B) right by 2 positions → 0.01001 × 2³.
  2. Add mantissas: 1.101 + 0.01001 = 10.11101.
  3. Normalise: result is 1.011101 × 2⁴ (left‑shift once, increase exponent).
  4. Apply bias: stored exponent = 4 + 127 = 131 → 10000011.
  5. Round (if needed): truncate/round to 23 fraction bits.

Pseudocode – Normalise After Addition (single precision)

function normalize(mantissa, exponent):

// mantissa includes extra guard bits (at least 2)

while mantissa >= (1 << (FRAC_BITS+1)): // overflow → left shift

mantissa >>= 1

exponent += 1

while mantissa < (1 << FRAC_BITS): // leading 0 → right shift

mantissa <<= 1

exponent -= 1

// ---- rounding to nearest, ties‑to‑even ----

guard = mantissa & 0b11 // two least‑significant bits

mantissa >>= 2 // drop guard bits

if guard == 0b10 and (mantissa & 1): // tie → make even

mantissa += 1

elif guard > 0b10:

mantissa += 1

// check for a new overflow caused by rounding

if mantissa == (1 << (FRAC_BITS+1)):

mantissa >>= 1

exponent += 1

return mantissa & ((1 << FRAC_BITS)-1), exponent

Constants: FRAC_BITS = 23 for single precision.

12. Precision Loss & Catastrophic Cancellation

When two nearly equal numbers are subtracted, the leading significant bits cancel, leaving only the less‑significant bits and dramatically reducing accuracy.

OperationExact DecimalIEEE‑754 (single) ResultRelative Error
13.625 − 13.6240.0010.0009765625 (≈ 2⁻¹⁰)≈ 2 % error
1.0000001 − 1.00000000.00000010 (underflow to zero)100 % loss

Mitigation strategies (AO3):

  • Re‑order calculations to avoid subtracting nearly equal values.
  • Use a higher‑precision format (double) when the algorithm is sensitive.
  • Apply compensated summation (e.g., Kahan algorithm) for large accumulations.

13. Special Cases

  • Zero: E = 0, F = 0. Sign distinguishes +0 and –0.
  • Denormalised numbers: E = 0, F ≠ 0. Implicit leading bit is 0, allowing values smaller than the smallest normalised number.
  • Infinity: E = 255, F = 0. Represents overflow or division by zero.
  • NaN (Not a Number): E = 255, F ≠ 0. Result of undefined operations (e.g., 0/0, √‑1).

14. Overflow & Underflow Handling (AO3)

  • Overflow: if the exponent after normalisation exceeds the maximum (254 for single), set the result to with the appropriate sign.
  • Underflow: if the exponent drops below the minimum normalised value (‑126), the number becomes denormalised; if it falls below ‑149, it rounds to zero.

Typical safe‑guard routine (pseudocode):

if exponent > MAX_EXP: // 254

result = (sign ? -INF : INF)

elif exponent < MINNORMALEXP: // -126

shift = MINNORMALEXP - exponent

mantissa >>= shift // create denormal

exponent = 0

if mantissa == 0:

result = (sign ? -0.0 : 0.0)

15. Mapping to Assessment Objectives

AOWhat the Student Must Do
AO1Recall the binary, octal, hexadecimal, BCD, ASCII/Unicode formats; the IEEE‑754 layout; bias, hidden bit and special values.
AO2Convert between bases, normalise a given decimal number, analyse precision loss, identify overflow/underflow and explain hardware/communication relevance.
AO3Design short algorithms (pseudocode) that normalise a floating‑point result, choose a rounding mode, and handle overflow/underflow; propose a network‑topology choice for a given scenario.

16. Key Points to Remember

  • Normalisation forces the mantissa to start with a hidden 1 (except for denormals).
  • The exponent is stored with a bias (127 for single, 1023 for double) to allow both positive and negative powers.
  • Rounding after normalisation can change the exponent again – always re‑check for overflow.
  • Precision is limited by the number of fraction bits; catastrophic cancellation is a common source of error.
  • Understanding normalisation is essential for debugging arithmetic, implementing custom floating‑point routines, and for AO3‑type programming questions.
  • Basic hardware concepts (logic gates, registers) and communication fundamentals (topologies, IP addressing) are required background for the whole syllabus.

Suggested diagram: Flowchart of the normalisation process – conversion → shift → exponent bias → fraction field → rounding → special‑case handling.