Normalise floating-point numbers

Published by Patrick Mutisya · 14 days ago

Cambridge A-Level Computer Science 9618 – Floating‑Point Numbers: Normalisation

13.3 Floating‑point Numbers – Representation and Manipulation

Objective: Normalise Floating‑point Numbers

In binary floating‑point representation a number is stored as

\$\text{value}=(-1)^{\text{sign}} \times 1.\text{fraction} \times 2^{\text{exponent}-\text{bias}}\$

Normalisation ensures that the fraction (also called the mantissa) is in the form 1.xxx…. This maximises the precision that can be stored for a given number of bits.

Why Normalise?

  • Provides a unique representation for each non‑zero value.
  • Allows the exponent to be used to shift the binary point, giving a wide dynamic range.
  • Facilitates comparison, addition and subtraction by aligning exponents.

Structure of a Binary Floating‑point Word

FieldNumber of BitsPurpose
Sign (S)10 for positive, 1 for negative
Exponent (E)8 (single precision) or 11 (double precision)Stores the exponent with a bias (127 for single, 1023 for double)
Fraction / Mantissa (F)23 (single) or 52 (double)Stores the bits after the leading 1 (the hidden bit)

Steps to Normalise a Binary Floating‑point Number

  1. Convert the decimal number to binary.
  2. Identify the position of the first ‘1’ to the left of the binary point.
  3. Shift the binary point so that exactly one ‘1’ appears to the left of it, producing the form 1.xxx… × 2^k.
  4. Calculate the exponent k (the number of positions the point was moved).
  5. Apply the bias: storedExponent = k + bias.
  6. Place the bits after the leading ‘1’ into the fraction field (pad with zeros if necessary).
  7. Set the sign bit according to the original number’s sign.

Example 1 – Normalising a Positive Decimal

Normalise \$13.625\$ in single‑precision (bias = 127).

  1. Binary of \$13.625\$: \$1101.101\$.
  2. Shift point three places left: \$1.101101 × 2^{3}\$.
  3. Exponent \$k = 3\$, so storedExponent = 3 + 127 = 130 → binary 10000010.
  4. Fraction bits are the bits after the leading 1: 10110100000000000000000 (padded to 23 bits).
  5. Sign bit = 0.

Resulting 32‑bit pattern:

0 10000010 10110100000000000000000

Example 2 – Normalising a Negative Decimal

Normalise \$-0.15625\$ in single‑precision.

  1. Binary of \$0.15625\$: \$0.00101\$.
  2. Shift point right two places: \$1.01 × 2^{-3}\$.
  3. Exponent \$k = -3\$, so storedExponent = -3 + 127 = 124 → binary 01111100.
  4. Fraction bits: 01000000000000000000000.
  5. Sign bit = 1.

Resulting 32‑bit pattern:

1 01111100 01000000000000000000000

Special Cases

  • Zero: All exponent and fraction bits are 0; sign bit distinguishes +0 and –0.
  • Denormalised numbers: Exponent bits are 0, leading hidden bit is 0, allowing representation of values smaller than the smallest normalised number.
  • Infinity: Exponent bits all 1, fraction bits all 0.
  • NaN (Not a Number): Exponent bits all 1, fraction non‑zero.

Normalisation in Arithmetic Operations

When adding or subtracting floating‑point numbers, the following steps are performed:

  1. Align exponents by shifting the mantissa of the smaller‑exponent operand.
  2. Add or subtract the mantissas.
  3. Normalise the result (may require left or right shift).
  4. Round to fit the mantissa field.
  5. Adjust exponent and handle overflow/underflow.

Suggested diagram: Flowchart of the normalisation process for a binary floating‑point number, showing conversion, shifting, exponent biasing, and field placement.

Key Points to Remember

  • Normalisation forces the mantissa to start with a hidden 1 (except for denormals).
  • The exponent is stored with a bias to allow both positive and negative powers.
  • Precision is limited by the number of fraction bits; rounding errors can accumulate.
  • Understanding normalisation is essential for debugging floating‑point arithmetic errors.