Floating Point Representation
This section explains the 32-bit IEEE-754 single-precision floating point representation used by most computers. The representation encodes a wide range of real numbers in a fixed 32-bit pattern using three fields: sign, exponent and fraction (mantissa). The stored bits and an implicit leading bit together form the significand (also called mantissa or fraction value).
Structure of a 32-bit (single-precision) floating point word
- Sign - 1 bit (most significant bit). Value 0 means positive, value 1 means negative.
- Exponent - 8 bits. Encoded with a bias. The bias for an exponent field of k bits is 2k-1 - 1. For k = 8, bias = 27 - 1 = 127.
- Fraction (stored mantissa) - 23 bits. These bits store the fractional part of the significand. For normalized numbers an implicit leading 1 is assumed so the effective significand has 24 bits (1.fraction). For subnormal numbers the implicit leading 1 is 0 and the significand is 0.fraction.
Reconstructing the numeric value from the bit fields
Let S be the sign bit, e the unsigned integer value of the exponent field, and f the fractional value represented by the 23 fraction bits (f = b1/2 + b2/4 + b3/8 + ... where b1,b2,... are fraction bits). The value represented is:
For normalized numbers (1 ≤ e ≤ 254):
(-1)S × (1 + f) × 2e - bias
For subnormal numbers (e = 0 and f ≠ 0):
(-1)S × (0 + f) × 21 - bias
Special encodings (e = 255):
- If e = 255 and fraction = 0: value is signed infinity (±∞ depending on S).
- If e = 255 and fraction ≠ 0: value is NaN (Not a Number).
- If e = 0 and fraction = 0: value is signed zero (+0 or -0).
Example 1: Binary (32 bits) → Decimal
Given IEEE-754 word: 11000001110100000000000000000000
Stepwise reconstruction (each line is one step of reasoning):
Sign bit S = 1, so the number is negative.
Exponent bits = 10000011 (the 8 bits after the sign).
Exponent field value e = 10000011₂ = 131₁₀.
Bias = 127, so exponent E = e - bias = 131 - 127 = 4.
Fraction bits (23 bits) = 10100000000000000000000₂.
Fractional value f = 1×(1/2) + 0×(1/4) + 1×(1/8) + 0×(1/16) + ... = 0.5 + 0 + 0.125 = 0.625.
Significand = 1 + f = 1.625 (implicit leading 1 for normalized numbers).
Numeric value = (-1)1 × 1.625 × 24.
Evaluate 24 = 16 and 1.625 × 16 = 26.
Final value = -26.
Conversion procedure: Decimal → IEEE-754 (single precision)
General steps to convert a real decimal number to 32-bit IEEE-754 single precision:
- Determine the sign bit: 0 if the number is ≥ 0, 1 if it is negative.
- Work with the absolute value of the number and convert it to binary (integer and fractional parts separately).
- Normalize the binary number so it is in the form 1.xxxxx × 2E for a non-zero normalized value. Count E (the exponent) accordingly.
- Compute the biased exponent e = E + bias (bias = 127 for single precision). If the biased exponent fits in 1..254, write it as the 8-bit exponent field.
- Take the fractional part after the leading 1 (the bits after the binary point in 1.xxxxx) and fill or truncate it to 23 bits to form the fraction field. Apply the chosen rounding mode if truncation is required (default IEEE rounding is round to nearest, ties to even).
- Handle special cases: if the true exponent E is too small to be represented as a normal number, produce a subnormal encoding if possible; if E is too large, produce ±∞ or raise overflow according to implementation.
Example 2: Convert -17 to 32-bit IEEE-754
Target value: -17
Stepwise conversion (each line is one step of reasoning):
Sign bit S = 1 because the number is negative.
Absolute value 17 in binary is 10001₂.
Normalize: 10001₂ = 1.0001₂ × 24, so E = 4.
Bias = 127, so biased exponent e = E + bias = 4 + 127 = 131.
Exponent field (8 bits) = 131₁₀ = 10000011₂.
Fractional part after the leading 1 is 0001 and concatenate zeros to make 23 bits: 00010000000000000000000.
Putting fields together: sign = 1, exponent = 10000011, fraction = 00010000000000000000000.
Final 32-bit IEEE-754 representation: 1 10000011 00010000000000000000000.
Important notes, limits and special values
- Hidden (implicit) bit: For normalized numbers the leading bit of the significand is implicitly 1 and is not stored; that is why the stored 23 bits plus implicit 1 give 24 significant bits of precision.
- Precision: Single precision provides about 24 bits of binary precision ≈ 7 decimal digits of precision.
- Range: Normalized exponent E ranges from -126 to +127 (that is e = 1 to 254). The largest finite single precision value is approximately (2 - 2-23) × 2127 ≈ 3.4028235 × 1038. The smallest positive normalized value is 2-126 ≈ 1.17549435 × 10-38. Subnormal numbers allow smaller magnitudes down to about 1.40129846 × 10-45.
- Subnormal (denormal) numbers: When the exponent field e = 0 and the fraction ≠ 0, the number is subnormal and the significand does not have an implicit leading 1. Subnormals fill the gap between zero and the smallest normalized number, at reduced precision.
- Zeros, infinities and NaNs: Exponent e = 0 and fraction = 0 encodes ±0. Exponent e = 255 and fraction = 0 encodes ±∞. Exponent e = 255 and fraction ≠ 0 encodes NaN (signalling or quiet NaN depending on fraction bits).
- Rounding and exceptions: When the exact value cannot be represented in 23 fraction bits the value is rounded. The default IEEE mode is round to nearest, ties to even. Overflow, underflow, inexact and invalid operations are handled according to IEEE-754 rules and may set floating-point status flags in hardware or software.
Common pitfalls for students
- Confusing the number of stored fraction bits (23) with the effective significant bits (24 including the implicit leading 1 for normalized numbers).
- For e = 0 treat the number as subnormal (no implicit 1). For e = 255 treat the number as special (∞ or NaN).
- When converting decimal fractions to binary, repeating binary fractions occur frequently; determine enough bits and then round according to IEEE rules rather than truncating without care.
Summary: A 32-bit IEEE-754 floating point number encodes sign, biased exponent and a stored fraction. The implicit leading 1 for normalized values gives 24 bits of significand precision. Always handle the special exponent patterns e = 0 and e = 255 separately. Follow the normalization, biasing and rounding rules when converting between decimal and binary floating point representations.