Convert binary floating-point real numbers into denary and vice versa

Cambridge International AS & A Level Computer Science (9618) – Floating‑Point Numbers

UnitSyllabus SectionsKey Topics
AS Level 1‑12
  • Information representation (binary, hexadecimal, character codes, two’s‑/one’s‑complement, overflow)
  • Communication, hardware fundamentals, processor architecture
  • System software, security, ethics, databases
  • Algorithm design, data types & structures, programming concepts
  • Software development life‑cycle
A Level 13‑20
  • Floating‑point representation, file organisation, protocols & switching
  • Advanced hardware, virtual machines, operating‑system concepts
  • Artificial intelligence, advanced algorithms, recursion, sorting & searching
  • Further programming paradigms, exception handling, optimisation

Each unit supports the three Assessment Objectives (AO):

  • AO1 – Knowledge & Understanding: recall terminology, describe principles.
  • AO2 – Analysis: evaluate alternative solutions, justify design choices.
  • AO3 – Design & Development: produce algorithms, write/pseudocode, test and debug.

13 Floating‑Point Numbers – Representation and Manipulation

Learning Objectives (AO1‑AO3)

  • Explain why integer representation is insufficient for many real‑world values.
  • Convert between binary floating‑point (IEEE 754) and decimal (denary) forms.
  • Analyse the impact of rounding, precision loss and special patterns on program correctness (AO2).
  • Design and implement a routine (pseudocode or a chosen language) that performs the conversion in both directions, handling normalised, denormalised, zero, infinity and NaN cases (AO3).

Why Use Floating‑Point?

Integer formats store whole numbers only. Real‑world applications—scientific calculations, graphics, finance—require:

  • Fractional values (e.g. 0.125, –3.14).
  • A very large dynamic range (≈10⁻³⁸ … 10³⁸ for single precision).
  • Predictable storage size (fixed number of bits).

Floating‑point notation satisfies these needs by separating a significand (the “mantissa”) from an exponent, analogous to scientific notation in decimal.

IEEE 754 Single‑Precision (32 bits) – Quick Reference

FieldBitsBiasPurpose
Sign (s)10 = positive, 1 = negative
Exponent (e)8127Stores exponent + bias; all‑zero and all‑one patterns are reserved.
Fraction / Mantissa (f)23Fractional part of the significand; an implicit leading 1 is assumed for normalised numbers.

The numeric value of a normalised pattern is

\[ V = (-1)^{s}\times (1.f)_{2}\times 2^{\,e-127} \]

Special Patterns

  • Zero: e = 0 and f = 0 → +0 (s = 0) or –0 (s = 1).
  • Denormalised numbers: e = 0 and f ≠ 0 → exponent = –126, no implicit leading 1 (value = (–1)ˢ × 0.f × 2⁻¹²⁶).
  • Infinity: e = 255 and f = 0 → +∞ (s = 0) or –∞ (s = 1).
  • NaN (Not a Number): e = 255 and f ≠ 0 → signalling or quiet NaN.

Conversion – Binary → Decimal (Normalised Numbers)

  1. Extract fields – split the 32‑bit pattern into s | e | f.
  2. Exponent – convert e from binary to decimal, then subtract the bias (127) to obtain the true exponent E.
  3. Significand – prepend an implicit 1 to the fraction bits: 1.f.
  4. Binary‑to‑decimal fraction – evaluate 1.f as a sum of powers of 2: \[ 1.f = 1 + \sum_{i=1}^{23} \frac{f_i}{2^{i}} \]
  5. Apply the formula: \[ V = (-1)^{s}\times (1.f) \times 2^{E} \]

Example 1 – Binary to Decimal

0 10000001 10100000000000000000000
  • Sign = 0 → positive.
  • Exponent = 10000001₂ = 129₁₀ → E = 129 − 127 = 2.
  • Fraction = 101000…1.f = 1.101₂.
  • 1.101₂ = 1 + 1/2 + 0/4 + 1/8 = 1.625.
  • Value = (+1) × 1.625 × 2² = 6.5.

Conversion – Decimal → Binary (IEEE 754 Single‑Precision)

  1. Sign bit – 0 for non‑negative, 1 for negative.
  2. Convert the absolute value to binary scientific notation:
    Find 1.f × 2ᴱ where 1 ≤ 1.f < 2.
  3. Biased exponente = E + 127; write e as an 8‑bit binary number.
  4. Fraction field – take the bits after the binary point of 1.f and fill the 23‑bit mantissa (pad with zeros if fewer bits).
  5. Combines | e | f gives the 32‑bit pattern.

Rounding

If more than 23 fraction bits are required, round to the nearest even value (the default IEEE 754 rounding mode). Other modes (toward 0, toward +∞, toward –∞) may be required for specialised applications.

Example 2 – Decimal to Binary

Convert -0.15625 to IEEE 754 single‑precision.

  1. Sign = 1.
  2. Binary fraction of 0.15625:
    • 0.15625 × 2 = 0.3125 → 0
    • 0.3125 × 2 = 0.625 → 0
    • 0.625 × 2 = 1.25 → 1 (subtract 1 → 0.25)
    • 0.25 × 2 = 0.5 → 0
    • 0.5 × 2 = 1.0 → 1 (terminates)
    Result: 0.00101₂.
  3. Normalise: 0.00101₂ = 1.01₂ × 2⁻³E = –3.
  4. Biased exponent: e = –3 + 127 = 124 = 01111100₂.
  5. Fraction bits after the leading 1: 01 → pad to 23 bits: 01000000000000000000000.
  6. Combined pattern:
    1 01111100 01000000000000000000000

Pseudocode for Decimal → IEEE 754 (Single‑Precision)

FUNCTION floatToIEEE754(x):
    IF x = 0:
        RETURN "0 00000000 00000000000000000000000"
    ENDIF

    sign ← 0 IF x >= 0 ELSE 1
    a ← ABS(x)

    // 1. Convert integer part to binary
    intPart ← FLOOR(a)
    intBin  ← binaryString(intPart)          // e.g. 13 → "1101"

    // 2. Convert fractional part to binary (up to 30 bits for safety)
    frac ← a - intPart
    fracBin ← ""
    REPEAT 30 TIMES:
        frac ← frac * 2
        IF frac >= 1:
            fracBin ← fracBin + "1"
            frac ← frac - 1
        ELSE
            fracBin ← fracBin + "0"
        ENDIF
    END REPEAT

    // 3. Normalise
    IF intBin ≠ ""          // number ≥ 1
        E ← LENGTH(intBin) - 1
        mantissaBits ← SUBSTRING(intBin,1) + fracBin   // drop leading 1
    ELSE                     // 0 ≤ a < 1
        firstOne ← POSITION("1", fracBin)
        E ← -firstOne
        mantissaBits ← SUBSTRING(fracBin, firstOne)   // bits after first 1
    ENDIF

    // 4. Biased exponent
    e ← E + 127
    expBits ← toBinary(e, 8)          // pad to 8 bits

    // 5. Fraction field (23 bits, round‑to‑nearest‑even)
    fraction ← ROUND_TO_EVEN(mantissaBits, 23)

    RETURN sign + " " + expBits + " " + fraction
END FUNCTION

Precision, Rounding Errors and Algorithmic Implications (AO2)

  • Relative error – \(\displaystyle \frac{|V_{\text{exact}}-V_{\text{float}}|}{|V_{\text{exact}}|}\). Errors accumulate when many operations are chained.
  • Catastrophic cancellation – subtracting two nearly equal numbers can erase significant digits, producing a large relative error.
  • Best practice – avoid unnecessary subtraction of close values; use scaling, higher precision (double‑precision) or arbitrary‑precision libraries when required.

Extension: IEEE 754 Double‑Precision (64 bits)

For A‑Level work that demands greater accuracy, the double‑precision format uses:

FieldBitsBias
Sign1
Exponent111023
Fraction52

The conversion steps are identical; only the exponent width, bias and mantissa length change.

Practice Questions (AO1‑AO3)

  1. Convert the binary pattern 0 10000010 01000000000000000000000 to decimal. Show each step and comment on any rounding.
  2. Express the decimal number 12.75 as a 32‑bit IEEE 754 binary pattern. Write pseudocode that performs the conversion and test it with this value.
  3. What decimal value does the pattern 1 11111111 00000000000000000000000 represent? Explain why this pattern is special.
  4. Design a simple calculator routine that adds two single‑precision numbers entered by a user. Discuss how rounding might affect the result and propose a method to detect overflow.

Suggested Diagram

Layout of a 32‑bit IEEE‑754 single‑precision word showing sign, exponent and mantissa fields
Figure: 32‑bit IEEE 754 single‑precision word.

Further Reading & Resources

  • Cambridge International AS & A Level Computer Science (9618) – Specification, Sections 13.1‑13.4.
  • IEEE 754‑2008 Standard – summary of rounding modes and special values.
  • “Numerical Analysis” (chapter on floating‑point arithmetic) – for deeper insight into error propagation.
  • Online visualiser: IEEE‑754 Converter.

Quick‑Look Checklist (use while comparing lecture notes to the syllabus)

Syllabus sectionWhat to verify in the notesTypical gaps & how to fix them
1 Information representation (binary, BCD, hexadecimal, ASCII/Unicode, two’s‑/one’s‑complement, overflow)
  • Binary‑base conversions shown step‑by‑step.
  • Clear distinction between kibi/kilo, mebi/mega.
  • Example of BCD ↔ decimal (e.g. digital clock).
  • Unicode code‑point illustration.
  • Overflow example (e.g. 8‑bit addition 200 + 100).
  • Students often see only decimal↔binary – add a short “BCD ↔ decimal” table.
  • Include a diagram of overflow with carry‑out discarded.
1.2 Multimedia graphics (bitmap vs. vector, colour depth, resolution)
  • Bitmap‑size formula with a worked example (e.g. 1920 × 1080 × 24‑bit).
  • Vector‑vs‑bitmap decision matrix.
  • Simple SVG‑like example to show scaling without quality loss.
  • Many notes omit vector graphics – insert a tiny SVG illustration and explain why it scales.
1.3 Compression (lossy vs. loss‑less, RLE, JPEG, MP3)
  • One loss‑less (RLE) and one lossy (JPEG) case study.
  • Table of typical compression ratios.
  • Add a short activity: compress a 10 KB text file with RLE and compare sizes.
2 Communication (LAN/WAN, topologies, client‑server vs. peer‑to‑peer, thin/thick client, Ethernet/CSMA‑CD, IP addressing, DNS, IPv4/IPv6, subnetting, wireless vs. wired, cloud basics)
  • Diagram of each topology with pros/cons.
  • Example IP address breakdown (e.g. 192.168.1.10/24) and a subnet‑mask exercise.
  • Definition of cloud computing with a real‑world example (e.g. Google Drive).
  • Subnetting often omitted – add a “calculate the number of hosts” worksheet.
  • Clarify difference between IP address and URL with a simple DNS lookup illustration.
3 Hardware (components, RAM/ROM types, SRAM/DRAM, PROM/EPROM/E‑PROM, buffers, embedded systems, sensors/actuators)
  • Table contrasting SRAM/DRAM (speed, volatility, cost).
  • Block diagram of a simple embedded system (e.g. Arduino + temperature sensor).
  • Include a short “what is a buffer?” graphic showing producer‑consumer flow.
3.2 Logic gates & circuits (NOT, AND, OR, NAND, NOR, XOR, truth tables, Boolean expressions, circuit construction)
  • Truth tables for each basic gate.
  • Examples of converting a Boolean expression to a circuit and vice‑versa.
  • Provide a step‑by‑step simplification example using De Morgan’s laws.

Create an account or Login to take a Quiz

114 views
0 improvement suggestions

Log in to suggest improvements to this note.