How do I save a floating-point number in 2 bytes?

问题

Yes I'm aware of the IEEE-754 half-precision standard, and yes I'm aware of the work done in the field. Put very simply, I'm trying to save a simple floating point number (like 52.1, or 1.25) in just 2 bytes.

I've tried some implementations in Java and in C# but they ruin the input value by decoding a different number. You feed in 32.1 and after encode-decode you get 32.0985.

Is there ANY way I can store floating point numbers in just 16-bits without ruining the input value?

Thanks very much.

回答1:

You could store three digits in BCD and use the remaining four bits for the decimal point position:

52.1 = 521 * 10 ^ -1 => 0x1521
1.25 = 125 * 10 ^ -2 => 0x2125

This would give you a range from 0.0000000000000001 to 999. You can of course add an offset for the decimal point to get for example the range 0.0000000001 to 999000000.

Simple implementation of four bit used for decimal point placement, and the rest for the value. Without any error check, and not thoroughly checked. (May have precision issues with some values when using != to compare doubles.)

public static short Encode(double value) {
  int cnt = 0;
  while (value != Math.Floor(value)) {
    value *= 10.0;
    cnt++;
  }
  return (short)((cnt << 12) + (int)value);
}

public static double Decode(short value) {
  int cnt = value >> 12;
  double result = value & 0xfff;
  while (cnt > 0) {
    result /= 10.0;
    cnt--;
  }
  return result;
}

Example:

Console.WriteLine(Encode(52.1));
Console.WriteLine(Decode(4617));

Output:

4617
52.1

回答2:

C# has no built in functionality for that, but you could try a fixed point approach.

Example of 8,8 Fixed point (8 before comma, 8 after):

float value = 123.45;
ushort fixedIntValue = (ushort)(value * 256);

This way, the number is stored like this: XXXXXXXX,XXXXXXXX

and you can retrieve the float again using this:

float value = fixedIntValue / 256f;

回答3:

Are you sure you need such a micro-optimization, over simply using a float or double?

Would you be better served by storing a short and understanding that, e.g., it's divided by 100 to make the actual number? (E.g. your examples of 52.1 and 1.25 could be stored as 5210 and 125) I think this might be the best solution for you.

If you're set on using an actual floating-point number, you can take the decoded number and round it to x significant digits, (from your example, 3) which should usually get you back the same number you started with (note that yes, that's intentionally vague - you can't guarantee getting the original unless you store the original).

回答4:

The problem is that you can't precisely represent 32.1 in any binary floating-point type.

In single-precision, the closest representable value is 32.099998. In half-precision, it's apparently 32.0985.

You could consider a decimal floating-point type, but this solution is not unique to half-precision.

回答5:

There are 4,278,190,080 32-bit floating-point values, not including NaNs and infinities. There are 65,536 values for the 16 bits in two bytes. Clearly, it is impossible to uniquely encode all the floating-point values in two bytes.

Which ones do you want to encode?

Even for a single value of the sign and exponent (e.g., all floating-point values from 4 to 8, not including 8), there are 8,388,608 floating-point values, so you cannot even encode those in two bytes.

You have to restrict yourself to a small subset of values to encode. Once you have done that, people may have suggestions about how to encode them. What is the actual problem you are trying to solve?

回答6:

From your examples you want to store 3 digits and a decimal point. You could simply encode your 'alphabet' of 11 symbols into a 4-bit code, and store 4 x 4 bits in 2 bytes.

来源：https://stackoverflow.com/questions/10414889/how-do-i-save-a-floating-point-number-in-2-bytes

标签

binary

floating-point

ieee-754

numerical