How to convert an unsigned int to a float?

问题

I need to build a function that returns the bit-level equivalent of (float)x without using any floating data types, operations or constants. I think I have it, but when I run the test file, it returns that there's an infinite loop. Any debugging help would be appreciated.

I'm allowed to use any integer/unsigned operations including ||, &&, if, while. Also, I can only use 30 operations

unsigned float_i2f(int x) {
    printf("\n%i", x);
    if (!x) {return x;}
    int mask1 = (x >> 31);
    int mask2 = (1 << 31);
    int sign = x & mask2;
    int complement = ~x + 1;
    //int abs = (~mask1 & x) + (mask1 & complement);
    int abs = x;
    int i = 0, temp = 0;
    while (!(temp & mask2)){
        temp = (abs <<i);
        i = i + 1;
    }
    int E = 32 - i;
    int exp = 127 + E;
    abs = abs & (-1 ^ (1 << E));
    int frac;
    if ((23 - E)>0)
        frac = (abs << (23 - E));
    else
        frac = (abs >> (E - 23));
    int rep = sign + (exp << 23) + frac;
    return rep;
}

In response to the very helpful comments and answers, here is the updated code, now only failing for 0x80000000:

unsigned float_i2f(int x) {
    int sign;
    int absX;
    int E = -1;
    int shift;
    int exp;
    int frac;
    // zero is the same in int and float:
    if (!x) {return x;}

    // sign is bit 31: that bit should just be transferred to the float:
    sign = x & 0x80000000;

    // if number is < 0, take two's complement:
    if (sign != 0) {
        absX = ~x + 1;
    }
    else
        absX = x;

    shift = absX;
    while ((!!shift) && (shift != -1)) {
        //std::cout << std::bitset<32>(shift) << "\n";
        E++;
        shift = (shift >> 1);
    }
    if (E == 30) { E++;}
    exp = E + 127+24;
    exp = (exp << 23);
    frac = (absX << (23 - E)) & 0x007FFFFF;
    return sign + exp + frac;
}

Anyone have any idea where the bug is in the revised code? Thank you all again!

回答1:

There is quite a lot you can do to improve your code and clean it up. For starters, add comments! Secondly, (and to reduce number of operations), you can combine certain things. Thirdly - differentiate between "integers that can be represented exactly" from "those that cannot".

Here is some sample code to put some of these things into practice; I could not actually compile and test this, so it's possible there are some bugs - I am trying to show an approach, not do your assignment for you...

unsigned float_i2f(int x) {
// convert integer to its bit-equivalent floating point representation
// but return it as an unsigned integer
// format: 
// 1 sign bit
// 8 exponent bits
// 23 mantissa bits (plus the 'most significant bit' which is always 1
printf("\n%i", x);

// zero is the same in int and float:
if (x == 0) {return x;}

// sign is bit 31: that bit should just be transferred to the float:
sign = x & 0x8000;

// if number is < 0, take two's complement:
int absX;
if(sign != 0) { 
  absX = ~x + 1;
}
else 
  absX = x;
}

// Take at most 24 bits:
unsigned int bits23 = 0xFF800000;
unsigned int bits24 = 0xFF000000;
unsigned E = 127-24;  // could be off by 1

// shift right if there are bits above bit 24:
while(absX & bits24) {
  E++;   // check that you add and don't subtract...
  absX >>= 1;
}
// shift left if there are no bits above bit 23:
// check that it terminates at the right point.
while (!(absX & bits23))
  E--;   // check direction
  absX <<= 1;
}

// now put the numbers we have together in the return value:
// check that they are truncated correctly
return sign | (E << 23) | (absX & ~bits23);

}

回答2:

Tried a solution that works for any size int.
Does not depend on 2's compliment.
Works with INT_MIN.
Learned much from @Floris

[Edit] Adjusted to do rounding and other improvements

#include <stdio.h>

int Round(uint32_t Odd, unsigned RoundBit, unsigned StickyBit, uint32_t Result);
int Inexact;

// Select your signed integer type: works with any one
//typedef int8_t integer;
//typedef int16_t integer;
//typedef int32_t integer;
typedef int64_t integer;
//typedef intmax_t integer;

uint32_t int_to_IEEEfloat(integer x) {
  uint32_t Result;
  if (x < 0) {  // Note 1
    Result = 0x80000000;
  } else {
    Result = 0;
    x = -x;  // Use negative absolute value. Note 2
  }
  if (x) {
    uint32_t Expo = 127 + 24 - 1;
    static const int32_t m2Power23 = -0x00800000;
    static const int32_t m2Power24 = -0x01000000;
    unsigned RoundBit = 0;
    unsigned StickyBit = 0;
    while (x <= m2Power24) {  // Note 3
      StickyBit |= RoundBit;
      RoundBit = x&1;
      x /= 2;
      Expo++;
    }
    // Round. Note 4
    if (Round(x&1, RoundBit, StickyBit, Result) && (--x <= m2Power24)) {
      x /= 2;
      Expo++;
    }
    if (RoundBit | StickyBit) {  // Note 5
      Inexact = 1; // TBD: Set FP inexact flag
    }
    int32_t i32 = x;  // Note 6
    while (i32 > m2Power23) {
      i32 *= 2;
      Expo--;
    }
    if (Expo >= 0xFF) {
      Result |= 0x7F800000; // Infinity  Note 7
    } else {
      Result |=  (Expo << 23) | ((-i32) & 0x007FFFFF);
    }
  }
  return Result;
}

/*
Note 1  If `integer` was a signed-magnitude or 1s compliment, then +0 and -0 exist.
Rather than `x<0`, this should be a test if the sign bit is set.
The following `if (x)` will not be taken on +0 and -0.
This provides the corresponding float +0.0 and -0.0 be returned.

Note 2 Overflow will _not_ occur using 2s compliment, 1s compliment or sign magnitude.
We are insuring x at this point is < 0.

Note 3 Right shifting may shift out a 1.  Use RoundBit and StickyBit to keep
track of bits shifted out for later rounding determination.

Note 4 Round as needed here.  Possible to need to shift once more after rounding.

Note 5 If either RoundBit or StickyBit set, the floating point inexact flag may be set.

Note 6 Since the `Integer` type maybe be less than 32 bits, we need to convert
to a 32 bit integer as IEEE float is 32 bits.FILE

Note 7 Infinity only expected in Integer was 129 bits or larger.
*/

int Round(uint32_t Odd, unsigned RoundBit, unsigned StickyBit, uint32_t Result) {
  // Round to nearest, ties to even
  return (RoundBit) && (Odd || StickyBit);

  // Truncate toward 0
  // return 0;

  // Truncate away from 0
  // return RoundBit | StickyBit

  // Truncate toward -Infinity
  // return (RoundBit | StickyBit) || Result
}

// For testing
float int_to_IEEEfloatf(integer x) {
  union {
    float f;
    uint32_t u;
  } xx;  // Overlay a float with a 32-bit unsigned integer
  Inexact = 0;
  printf("%20lld ", (long long) x);
  xx.u = int_to_IEEEfloat(x);
  printf("%08lX ", (long) xx.u);
  printf("%d : ", Inexact);
  printf("%.8e\n", xx.f);
  return xx.f;
}

int main() {
  int_to_IEEEfloatf(0x0);
  int_to_IEEEfloatf(0x1);
  int_to_IEEEfloatf(-0x1);
  int_to_IEEEfloatf(127);
  int_to_IEEEfloatf(-128);
  int_to_IEEEfloatf(12345);
  int_to_IEEEfloatf(32767);
  int_to_IEEEfloatf(-32768);
  int_to_IEEEfloatf(16777215);
  int_to_IEEEfloatf(16777216);
  int_to_IEEEfloatf(16777217);
  int_to_IEEEfloatf(2147483647L);
  int_to_IEEEfloatf(-2147483648L);
  int_to_IEEEfloatf( 9223372036854775807LL);
  int_to_IEEEfloatf(-9223372036854775808LL);
  return 0;
}

回答3:

When saying 30 operations do you count iterations of the loops?

if (!x) {return x;}

only handle the positive 0s. Why don't mask the sign and it'll work for both zeros

if (!(x & 0x7FFFFFFF)) {return x;}

Besides, many instructions are not needed, for example

complement = ~x + 1;

Just x = -x is enough because x isn't use anymore later, absX or complement is just redundant. And one negation instruction is faster than 2 operations, right?

!!shift is also slower than shift != 0. It's only useful when you need to use it as an expression of only 0 and 1, otherwise it's redundant.

Another problem is signed operations may sometimes slower than unsigned ones, so if when not necessary you shouldn't declare a variable as int. For example shift = (shift >> 1) will do an arithmetic shift (in most compiler implementations) which may cause unexpected result.

And to find the first bit set there are available instructions for that, no need for shift and test. Just find the bit position and shift the value once. If you're not allowed to use intrinsics then there are many fast ways to do that on Bit Twiddling Hacks too.

来源：https://stackoverflow.com/questions/19529356/how-to-convert-an-unsigned-int-to-a-float

标签

binary

type-conversion

bit

unsigned-integer