问题
I am developing some code that can get its data from the HW in floating or fixed point. Currently we get that as floating point.
The low layer APIs are all in fixed point. So we must pass data back as fixed point. The algorithm we are using is Cholesky. I am wondering why we must use floating point for Cholesky and not just get the data as fixed point. Are there any advantages in doing that?
I would have thought that using floating point would have caused more rounding error.
回答1:
The main advantages of fixed point over floating point are
It is much simpler to implement in hardware,
Certain operations are exact (i.e. incur no rounding error): namely, addition, subtraction and multiplication by integers, assuming the result doesn't overflow.
If all your numbers are of the same magnitude, you can get a few extra bits of precision in the same width by not needing to store an exponent: e.g. in 32-bits vs 24-bits in binary32 single precision.
In particular, point 3 is unlikely to be the case for all the numbers throughout the entire stage of your computation, particularly for linear algebra operations such as Cholesky factorisations.
On the other hand, floating point has many other advantages.
You can store numbers across a wider variety of numbers across wider range of magnitudes (e.g. ~10-38 to 10+38 for binary32)
You don't lose accuracy when working with smaller numbers: this is particularly important for multiplication/division, which are used throughout Cholesky computations.
Underflow and overflow are less of a problem: they are both less likely to occur (due to 1), but also be handled more gracefully when the do occur, via
Inf
and subnormals vs exceptions or erroneous results.A floating point format encompasses a slightly smaller fixed point format: i.e. binary32 includes all numbers in a 24-bit fixed point format, but has all the above advantages.
回答2:
An advantage of floating-point over fixed-point is the range of numbers you can represent. I am not familiar with the Cholesky algorithm, but if it has to deal with very large and very small numbers internally, floating-point will provide more accurate results.
If you use fixed-point arithmetic, you need to make sure, that the input cannot cause saturation or overflow inside the algorithm and restrict it to a specific range. Also, it can be difficult to define this range, especially if you have more than one input.
来源:https://stackoverflow.com/questions/40780974/fixed-point-cholesky-algorithm-advantages