Floating point versus fixed point: what are the pros/cons?

前端 未结 8 2115
别跟我提以往
别跟我提以往 2021-02-04 02:40

Floating point type represents a number by storing its significant digits and its exponent separately on separate binary words so it fits in 16, 32, 64 or 128 bits.

Fixe

相关标签:
8条回答
  • 2021-02-04 02:43

    The diferrence between floating point and integer math depends on the CPU you have in mind. On Intel chips the difference is not big in clockticks. Int math is still faster because there are multiple integer ALU's that can work in parallel. Compilers are also smart to use special adress calculation instructions to optimize add/multiply in a single instruction. Conversion counts as an operation too, so just choose your type and stick with it.

    In C++ you can build your own type for fixed point math. You just define as struct with one int and override the appropriate overloads, and make them do what they normally do plus a shift to put the comma back to the right position.

    0 讨论(0)
  • 2021-02-04 02:47

    That definition covers a very limited subset of fixed point implementations.

    It would be more correct to say that in fixed point only the mantissa is stored and the exponent is a constant determined a-priori. There is no requirement for the binary point to fall inside the mantissa, and definitely no requirement that it fall on a word boundary. For example, all of the following are "fixed point":

    • 64 bit mantissa, scaled by 2-32 (this fits the definition listed in the question)
    • 64 bit mantissa, scaled by 2-33 (now the integer and fractional parts cannot be separated by an octet boundary)
    • 32 bit mantissa, scaled by 24 (now there is no fractional part)
    • 32 bit mantissa, scaled by 2-40 (now there is no integer part)

    GPUs tend to use fixed point with no integer part (typically 32-bit mantissa scaled by 2-32). Therefore APIs such as OpenGL and Direct3D often use floating-point types which are capable of holding these values. However, manipulating the integer mantissa is often more efficient so these APIs allow specifying coordinates (in texture space, color space, etc) this way as well.

    As for your claim that C++ doesn't have a fixed point type, I disagree. All integer types in C++ are fixed point types. The exponent is often assumed to be zero, but this isn't required and I have quite a bit of fixed-point DSP code implemented in C++ this way.

    0 讨论(0)
  • 2021-02-04 02:53

    You need to be careful when discussing "precision" in this context.

    For the same number of bits in representation the maximum fixed point value has more significant bits than any floating point value (because the floating point format has to give some bits away to the exponent), but the minimum fixed point value has fewer than any non-denormalized floating point value (because the fixed point value wastes most of its mantissa in leading zeros).

    Also depending on the way you divide the fixed point number up, the floating point value may be able to represent smaller numbers meaning that it has a more precise representation of "tiny but non-zero".

    And so on.

    0 讨论(0)
  • 2021-02-04 02:54

    It depends on what you're working on. If you're using fixed point then you lose precision; you have to select the number of places after the decimal place (which may not always be good enough). In floating point you don't need to worry about this as the precision offered is nearly always good enough for the task in hand - uses a standard form implementation to represent the number.

    The pros and cons come down to speed and resources. On modern 32bit and 64bit platforms there is really no need to use fixed point. Most systems come with built in FPUs that are hardwired to be optimised for fixed point operations. Furthermore, most modern CPU intrinsics come with operations such as the SIMD set which help optimise vector based methods via vectorisation and unrolling. So fixed point only comes with a down side.

    On embedded systems and small microcontrollers (8bit and 16bit) you may not have an FPU nor extended instruction sets. In which case you may be forced to use fixed point methods or the limited floating point instruction sets that are not very fast. So in these circumstances fixed point will be a better - or even your only - choice.

    0 讨论(0)
  • 2021-02-04 02:55

    One issue not covered is the answers is a power consumption. Though it highly depends on specific hardware architecture, usually FPU consumes much more energy than ALU in CPU thus if you target mobile applications where power consumption is important it's worth consider fixed point impelementation of the algorithm.

    0 讨论(0)
  • 2021-02-04 03:02

    At the code level, fixed-point arithmetic is simply integer arithmetic with an implied denominator.

    For many simple arithmetic operations, fixed-point and integer operations are essentially the same. However, there are some operations which the intermediate values must be represented with a higher number of bits and then rounded off. For example, to multiply two 16-bit fixed-point numbers, the result must be temporarily stored in 32-bit before renormalizing (or saturating) back to 16-bit fixed-point.

    When the software does not take advantage of vectorization (such as CPU-based SIMD or GPGPU), integer and fixed-point arithmeric is faster than FPU. When vectorization is used, the efficiency of vectorization matters a lot more, such that the performance differences between fixed-point and floating-point is moot.

    Some architectures provide hardware implementations for certain math functions, such as sin, cos, atan, sqrt, for floating-point types only. Some architectures do not provide any hardware implementation at all. In both cases, specialized math software libraries may provide those functions by using only integer or fixed-point arithmetic. Often, such libraries will provide multiple level of precisions, for example, answers which are only accurate up to N-bits of precision, which is less than the full precision of the representation. The limited-precision versions may be faster than the highest-precision version.

    0 讨论(0)
提交回复
热议问题