min/max of collections containing NaN (handling incomparability in ordering)

后端未结

关注

 4  2046

I just ran into a nasty bug as a result of the following behavior:

scala> List(1.0, 2.0, 3.0, Double.NaN).min
res1: Double = NaN

scala> List(1.0, 2.0, 3.0


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2021-02-14 01:11
              
            
            
                                                                       
Disclaimer: I'll add my own answer to the question just in case anyone else is still interested in more details on the matter.

Some theory ...

I looks like this issue is more complex than I expected. As Alexey Romanov has already pointed out, the notion of incomparability would require the max/min functions to take a partial order. Unfortunately, Alexey is also right in the point, that a general max/min function based on a partial order does not make sense: Think of the case where the partial ordering only defines relations within certain groups, but the groups themselves are completely independent from each other (for instance, the elements {a, b, c, d} with just the two relations a < b and c < d; we would have two max/min). In that regard one may even argue that formally max/min should always return two values, NaN and the respective valid min/max since NaN itself is also an extremal value in its own relation group.

So as a result of partial orders being too general/complex, the min/max functions take an Ordering. Unfortunately a total order does not allow the notion of incomparability. Reviewing the three defining properties of a total order makes it pretty obvious that "ignoring NaNs" is formally impossible:


If a ≤ b and b ≤ a then a = b (antisymmetry)
If a ≤ b and b ≤ c then a ≤ c (transitivity)
a ≤ b or b ≤ a (totality)


... and practice ...

So when trying to come up with an implementation of an Ordering to fulfill our desired min/max behavior it is clear that we have to violate something (and bear the consequences). The implementation of min/max/minBy/maxBy in TraversableOnce follows the pattern (for min):

reduceLeft((x, y) => if (cmp.lteq(x, y)) x else y)


and gteq for the max variants. This gave me the idea of "left biasing" the comparison, i.e.:

x   <comparison_operator> NaN    is always true to keep x in the reduction
NaN <comparison_operator> x      is always false to inject x into the reduction


The resulting implementation of such a "left biased" ordering would look like this:

object BiasedOrdering extends Ordering[Double] {
  def compare(x: Double, y: Double) = java.lang.Double.compare(x, y) // this is inconsistent, but the same goes for Double.Ordering

  override def lteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) <= 0
  override def gteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) >= 0
  override def lt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) < 0
  override def gt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) > 0
  override def equiv(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) == 0

}


... analyzed:

Currently I'm trying to find out:


how this order compares to the default ordering,
where do we violate total order properties,
and what are the potential issues.


I'm comparing this to Scala's default order Ordering.Double and the following ordering which is directly derived from java.lang.Double.compare:

object OrderingDerivedFromCompare extends Ordering[Double] {
  def compare(x: Double, y: Double) = {
    java.lang.Double.compare(x, y)
  }
}


One interesting property of Scala's default order Ordering.Double is that it overwrites all comparison member functions by the language's native numerical comparison operators (<, <=, ==, >=, >)  so the comparison results are identical as if we would compare directly with these operators. The following shows all possible relations between a NaN and a valid number for the three orderings:

Ordering.Double             0.0 >  NaN = false
Ordering.Double             0.0 >= NaN = false
Ordering.Double             0.0 == NaN = false
Ordering.Double             0.0 <= NaN = false
Ordering.Double             0.0 <  NaN = false
OrderingDerivedFromCompare  0.0 >  NaN = false
OrderingDerivedFromCompare  0.0 >= NaN = false
OrderingDerivedFromCompare  0.0 == NaN = false
OrderingDerivedFromCompare  0.0 <= NaN = true
OrderingDerivedFromCompare  0.0 <  NaN = true
BiasedOrdering              0.0 >  NaN = true
BiasedOrdering              0.0 >= NaN = true
BiasedOrdering              0.0 == NaN = true
BiasedOrdering              0.0 <= NaN = true
BiasedOrdering              0.0 <  NaN = true

Ordering.Double             NaN >  0.0 = false
Ordering.Double             NaN >= 0.0 = false
Ordering.Double             NaN == 0.0 = false
Ordering.Double             NaN <= 0.0 = false
Ordering.Double             NaN <  0.0 = false
OrderingDerivedFromCompare  NaN >  0.0 = true
OrderingDerivedFromCompare  NaN >= 0.0 = true
OrderingDerivedFromCompare  NaN == 0.0 = false
OrderingDerivedFromCompare  NaN <= 0.0 = false
OrderingDerivedFromCompare  NaN <  0.0 = false
BiasedOrdering              NaN >  0.0 = false
BiasedOrdering              NaN >= 0.0 = false
BiasedOrdering              NaN == 0.0 = false
BiasedOrdering              NaN <= 0.0 = false
BiasedOrdering              NaN <  0.0 = false

Ordering.Double             NaN >  NaN = false
Ordering.Double             NaN >= NaN = false
Ordering.Double             NaN == NaN = false
Ordering.Double             NaN <= NaN = false
Ordering.Double             NaN <  NaN = false
OrderingDerivedFromCompare  NaN >  NaN = false
OrderingDerivedFromCompare  NaN >= NaN = true
OrderingDerivedFromCompare  NaN == NaN = true
OrderingDerivedFromCompare  NaN <= NaN = true
OrderingDerivedFromCompare  NaN <  NaN = false
BiasedOrdering              NaN >  NaN = false
BiasedOrdering              NaN >= NaN = true
BiasedOrdering              NaN == NaN = true
BiasedOrdering              NaN <= NaN = true
BiasedOrdering              NaN <  NaN = false


We can see that:


only OrderingDerivedFromCompare fulfills the total order properties. Based on this result the reasoning behind java.lang.Double.compare becomes much more clear: Placing NaN at the upper end of the total order simply avoids any contradiction!
Scala's default order and the biased order violate many totality conditions. Scala's default order always returns false, while for the biased order it depends on the position. Since both lead to contradictions it is difficult to see which may lead to more severe issues.


Now to our actual problem at hand, the min/max functions. For OrderingDerivedFromCompare it is now clear what we have to obtain -- NaN is simply the largest value, so it's clear to obtain it as max, irrespective of how the elements in the list are arranged:

OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).max = NaN
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).max = NaN


Now to Scala's default ordering. I was deeply shocked to see that the situation is actually even more intricate than mentioned in my question:

Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).min = NaN
Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).max = NaN
Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0


In fact the order of the elements becomes relevant (as a result of returning false for every comparison in the reduceLeft). "Left biasing" obviously solves this issue, leading to consistent results:

BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).max = 3.0
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0


Unfortunately, I'm still not able to fully answer all questions here. Some remaining points are:


Why is the Scala's default ordering defined the way it is? Currently handling of NaNs seems to be pretty flawed. A very dangerous detail of the Ordering.Double is that the compare function actually delegates to java.lang.Double.compare, while the comparison member are implemented based on the language's native comparisons. This obviously leads to inconsistent results, for instance:

Ordering.Double.compare(0.0, Double.NaN) == -1     // indicating 0.0 < NaN
Ordering.Double.lt     (0.0, Double.NaN) == false  // contradiction

What are potential drawbacks of the BiasedOrdering, apart from directly evaluating any contradicting comparison? A quick check on sorted gave the following results, which did not reveal any trouble:

Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)

Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)



For the time being I'll have a go with this left biased ordering. But since the nature of the problem does not allow a flawless general solution: use with care!

Update

And in terms of solutions based on an implicit class as monkjack suggested, I like the following a lot (since it does not mess with (flawed?) total orders at all, but internally converts to a clean totally ordered domain):

implicit class MinMaxNanAware(t: TraversableOnce[Double]) {
  def nanAwareMin = t.minBy(x => if (x.isNaN) Double.PositiveInfinity else x)
  def nanAwareMax = t.maxBy(x => if (x.isNaN) Double.NegativeInfinity else x)
}

// and now we can simply use
val goodMin = list.nanAwareMin

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2021-02-14 01:11
              
            
            
                                                                       
For

val a = List(1.0, 2.0, 3.0, Double.NaN)


sort it,

a.sortWith {_ >_ }
res: List[Double] = List(3.0, 2.0, 1.0, NaN)


and so NaN values are relegated, thus for max,

a.sortWith {_ >_ }.head
res: Double = 3.0


Likewise

a.sortWith {_ < _ }
res: List[Double] = List(1.0, 2.0, 3.0, NaN)


and so for min,

a.sortWith {_ < _ }.head
res: Double = 1.0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小蘑菇        
                
              
                            
                2021-02-14 01:26
              
            
            
                                                                       
This answer is simply for explaining the issue, @monkjack's answer probably provides the best practical solution.


  Now since Scala offers the possibility to implicitly pass such an ordering, isn't it a natural desire to pass an ordering, which can handle "incomparability" according to our demands


Ordering in Scala represents only total orderings, i.e. ones where all elements are comparable. There is a PartialOrdering[T]: http://www.scala-lang.org/api/2.10.3/index.html#scala.math.PartialOrdering, but there are a couple of  problems:


It is not actually used anywhere in the standard library.
If you try to implement max/maxBy/etc. which take PartialOrdering, you'll quickly see that it isn't generally possible except in cases like Float/Double where you have some elements which aren't comparable with anything and all the rest are comparable with each other (and you can decide to just ignore the incomparable elements).

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2021-02-14 01:27
              
            
            
                                                                       
What about bringing an implicit into scope that would allow you to have new min/max methods on the list.

Something like:

object NanAwareMinOrdering extends Ordering[Double] {
    def compare(x: Double, y: Double) = {
      if (x.isNaN()) {
        +1 // without checking x, return y < x
      } else if (y.isNaN()) {
        -1 // without checking y, return x < y
      } else {
        java.lang.Double.compare(x, y)
      }
    }
  }

object NanAwareMaxOrdering extends Ordering[Double] {
  ....
}

implicit class MinMaxList(list:List[Double]) {
  def min2 = list.min(NanAwareMinOrdering)
  def max2 = list.max(NanAwareMaxOrdering)
}


List(1.0, 2.0, 3.0, Double.NaN).min2
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复