Comparing IEEE floats and doubles for equality

前端未结

关注

 15  2189

What is the best method for comparing IEEE floats and doubles for equality? I have heard of several methods, but I wanted to see what the community thought.

相关标签:

15条回答

无人共我

2020-11-30 06:51

If you have floating point errors you have even more problems than this. Although I guess that is up to personal perspective.

Even if we do the numeric analysis to minimize accumulation of error, we can't eliminate it and we can be left with results that ought to be identical (if we were calculating with reals) but differ (because we cannot calculate with reals).

0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-11-30 06:53
In numerical software you often want to test whether two floating point numbers are exactly equal. LAPACK is full of examples for such cases. Sure, the most common case is where you want to test whether a floating point number equals "Zero", "One", "Two", "Half". If anyone is interested I can pick some algorithms and go more into detail.

Also in BLAS you often want to check whether a floating point number is exactly Zero or One. For example, the routine dgemv can compute operations of the form
- y = beta*y + alpha*A*x
- y = beta*y + alpha*A^T*x
- y = beta*y + alpha*A^H*x
So if beta equals One you have an "plus assignment" and for beta equals Zero a "simple assignment". So you certainly can cut the computational cost if you give these (common) cases a special treatment.

Sure, you could design the BLAS routines in such a way that you can avoid exact comparisons (e.g. using some flags). However, the LAPACK is full of examples where it is not possible.

P.S.:
- There are certainly many cases where you don't want check for "is exactly equal". For many people this even might be the only case they ever have to deal with. All I want to point out is that there are other cases too.
- Although LAPACK is written in Fortran the logic is the same if you are using other programming languages for numerical software.
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2020-11-30 06:54

If you are looking for two floats to be equal, then they should be identically equal in my opinion. If you are facing a floating point rounding problem, perhaps a fixed point representation would suit your problem better.

Perhaps we cannot afford the loss of range or performance that such an approach would inflict.

0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2020-11-30 06:57
@DrPizza: I am no performance guru but I would expect fixed point operations to be quicker than floating point operations (in most cases).

@Craig H: Sure. I'm totally okay with it printing that. If a or b store money then they should be represented in fixed point. I'm struggling to think of a real world example where such logic ought to be allied to floats. Things suitable for floats:
- weights
- ranks
- distances
- real world values (like from a ADC)
For all these things, either you much then numbers and simply present the results to the user for human interpretation, or you make a comparative statement (even if such a statement is, "this thing is within 0.001 of this other thing"). A comparative statement like mine is only useful in the context of the algorithm: the "within 0.001" part depends on what physical question you're asking. That my 0.02. Or should I say 2/100ths?
0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2020-11-30 07:00

@DrPizza: I am no performance guru but I would expect fixed point operations to be quicker than floating point operations (in most cases).

It rather depends on what you are doing with them. A fixed-point type with the same range as an IEEE float would be many many times slower (and many times larger).

Things suitable for floats:

3D graphics, physics/engineering, simulation, climate simulation....

0 讨论(0)
发布评论:

提交评论
- 加载中...

感动是毒

2020-11-30 07:03

The current version I am using is this

bool is_equals(float A, float B,
               float maxRelativeError, float maxAbsoluteError)
{

  if (fabs(A - B) < maxAbsoluteError)
    return true;

  float relativeError;
  if (fabs(B) > fabs(A))
    relativeError = fabs((A - B) / B);
  else
    relativeError = fabs((A - B) / A);

  if (relativeError <= maxRelativeError)
    return true;

  return false;
}

This seems to take care of most problems by combining relative and absolute error tolerance. Is the ULP approach better? If so, why?

0 讨论(0)