Precision loss in integer to float conversion

Least significant bits of integer lost during conversion to floating-point type

Description

This defect occurs when you cast an integer value to a floating-point type that cannot represent the original integer value.

For instance, the long int value 1234567890L is too large for a variable of type float.

Risk

If the floating-point type cannot represent the integer value, the behavior is undefined (see C11 standard, 6.3.1.4, paragraph 2). For instance, least significant bits of the variable value can be dropped leading to unexpected results.

Fix

Convert to a floating-point type that can represent the integer value.

For instance, if the float data type cannot represent the integer value, use the double data type instead.

When writing a function that converts an integer to floating point type, before the conversion, check if the integer value can be represented in the floating-point type. For instance, DBL_MANT_DIG * log2(FLT_RADIX) represents the number of base-2 digits in the type double. Before conversion to the type double, check if this number is greater than or equal to the precision of the integer that you are converting. To determine the precision of an integer num, use this code:

 size_t precision = 0;
 while (num != 0) {
    if (num % 2 == 1) {
      precision++;
    }
    num >>= 1;
 }

Some implementations provide a builtin function to determine the precision of an integer. For instance, GCC provides the function __builtin_popcount.

Examples

expand all

Conversion of Large Integer to Floating-Point Type

#include <stdio.h>

int main(void) {
  long int big = 1234567890L;
  float approx = big;
  printf("%ld\n", (big - (long int)approx));
  return 0;
}

In this C code, the long int variable big is converted to float.

Correction — Use a Wider Floating-Point Type

One possible correction is to convert to the double data type instead of float.

#include <stdio.h>

int main(void) {
  long int big = 1234567890L;
  double approx = big;
  printf("%ld\n", (big - (long int)approx));
  return 0;
}

Result Information

Group: Numerical

Language: C | C++

Default: Off

Command-Line Syntax: INT_TO_FLOAT_PRECISION_LOSS

Impact: Low

Version History

Introduced in R2018b