min/max of collections containing NaN (handling incomparability in ordering)

后端 未结 4 2047
别跟我提以往
别跟我提以往 2021-02-14 00:38

I just ran into a nasty bug as a result of the following behavior:

scala> List(1.0, 2.0, 3.0, Double.NaN).min
res1: Double = NaN

scala> List(1.0, 2.0, 3.0         


        
4条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-14 01:11

    Disclaimer: I'll add my own answer to the question just in case anyone else is still interested in more details on the matter.

    Some theory ...

    I looks like this issue is more complex than I expected. As Alexey Romanov has already pointed out, the notion of incomparability would require the max/min functions to take a partial order. Unfortunately, Alexey is also right in the point, that a general max/min function based on a partial order does not make sense: Think of the case where the partial ordering only defines relations within certain groups, but the groups themselves are completely independent from each other (for instance, the elements {a, b, c, d} with just the two relations a < b and c < d; we would have two max/min). In that regard one may even argue that formally max/min should always return two values, NaN and the respective valid min/max since NaN itself is also an extremal value in its own relation group.

    So as a result of partial orders being too general/complex, the min/max functions take an Ordering. Unfortunately a total order does not allow the notion of incomparability. Reviewing the three defining properties of a total order makes it pretty obvious that "ignoring NaNs" is formally impossible:

    1. If a ≤ b and b ≤ a then a = b (antisymmetry)
    2. If a ≤ b and b ≤ c then a ≤ c (transitivity)
    3. a ≤ b or b ≤ a (totality)

    ... and practice ...

    So when trying to come up with an implementation of an Ordering to fulfill our desired min/max behavior it is clear that we have to violate something (and bear the consequences). The implementation of min/max/minBy/maxBy in TraversableOnce follows the pattern (for min):

    reduceLeft((x, y) => if (cmp.lteq(x, y)) x else y)
    

    and gteq for the max variants. This gave me the idea of "left biasing" the comparison, i.e.:

    x    NaN    is always true to keep x in the reduction
    NaN  x      is always false to inject x into the reduction
    

    The resulting implementation of such a "left biased" ordering would look like this:

    object BiasedOrdering extends Ordering[Double] {
      def compare(x: Double, y: Double) = java.lang.Double.compare(x, y) // this is inconsistent, but the same goes for Double.Ordering
    
      override def lteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) <= 0
      override def gteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) >= 0
      override def lt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) < 0
      override def gt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) > 0
      override def equiv(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) == 0
    
    }
    

    ... analyzed:

    Currently I'm trying to find out:

    • how this order compares to the default ordering,
    • where do we violate total order properties,
    • and what are the potential issues.

    I'm comparing this to Scala's default order Ordering.Double and the following ordering which is directly derived from java.lang.Double.compare:

    object OrderingDerivedFromCompare extends Ordering[Double] {
      def compare(x: Double, y: Double) = {
        java.lang.Double.compare(x, y)
      }
    }
    

    One interesting property of Scala's default order Ordering.Double is that it overwrites all comparison member functions by the language's native numerical comparison operators (<, <=, ==, >=, >) so the comparison results are identical as if we would compare directly with these operators. The following shows all possible relations between a NaN and a valid number for the three orderings:

    Ordering.Double             0.0 >  NaN = false
    Ordering.Double             0.0 >= NaN = false
    Ordering.Double             0.0 == NaN = false
    Ordering.Double             0.0 <= NaN = false
    Ordering.Double             0.0 <  NaN = false
    OrderingDerivedFromCompare  0.0 >  NaN = false
    OrderingDerivedFromCompare  0.0 >= NaN = false
    OrderingDerivedFromCompare  0.0 == NaN = false
    OrderingDerivedFromCompare  0.0 <= NaN = true
    OrderingDerivedFromCompare  0.0 <  NaN = true
    BiasedOrdering              0.0 >  NaN = true
    BiasedOrdering              0.0 >= NaN = true
    BiasedOrdering              0.0 == NaN = true
    BiasedOrdering              0.0 <= NaN = true
    BiasedOrdering              0.0 <  NaN = true
    
    Ordering.Double             NaN >  0.0 = false
    Ordering.Double             NaN >= 0.0 = false
    Ordering.Double             NaN == 0.0 = false
    Ordering.Double             NaN <= 0.0 = false
    Ordering.Double             NaN <  0.0 = false
    OrderingDerivedFromCompare  NaN >  0.0 = true
    OrderingDerivedFromCompare  NaN >= 0.0 = true
    OrderingDerivedFromCompare  NaN == 0.0 = false
    OrderingDerivedFromCompare  NaN <= 0.0 = false
    OrderingDerivedFromCompare  NaN <  0.0 = false
    BiasedOrdering              NaN >  0.0 = false
    BiasedOrdering              NaN >= 0.0 = false
    BiasedOrdering              NaN == 0.0 = false
    BiasedOrdering              NaN <= 0.0 = false
    BiasedOrdering              NaN <  0.0 = false
    
    Ordering.Double             NaN >  NaN = false
    Ordering.Double             NaN >= NaN = false
    Ordering.Double             NaN == NaN = false
    Ordering.Double             NaN <= NaN = false
    Ordering.Double             NaN <  NaN = false
    OrderingDerivedFromCompare  NaN >  NaN = false
    OrderingDerivedFromCompare  NaN >= NaN = true
    OrderingDerivedFromCompare  NaN == NaN = true
    OrderingDerivedFromCompare  NaN <= NaN = true
    OrderingDerivedFromCompare  NaN <  NaN = false
    BiasedOrdering              NaN >  NaN = false
    BiasedOrdering              NaN >= NaN = true
    BiasedOrdering              NaN == NaN = true
    BiasedOrdering              NaN <= NaN = true
    BiasedOrdering              NaN <  NaN = false
    

    We can see that:

    • only OrderingDerivedFromCompare fulfills the total order properties. Based on this result the reasoning behind java.lang.Double.compare becomes much more clear: Placing NaN at the upper end of the total order simply avoids any contradiction!
    • Scala's default order and the biased order violate many totality conditions. Scala's default order always returns false, while for the biased order it depends on the position. Since both lead to contradictions it is difficult to see which may lead to more severe issues.

    Now to our actual problem at hand, the min/max functions. For OrderingDerivedFromCompare it is now clear what we have to obtain -- NaN is simply the largest value, so it's clear to obtain it as max, irrespective of how the elements in the list are arranged:

    OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
    OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
    OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).max = NaN
    OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).max = NaN
    

    Now to Scala's default ordering. I was deeply shocked to see that the situation is actually even more intricate than mentioned in my question:

    Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).min = NaN
    Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
    Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).max = NaN
    Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0
    

    In fact the order of the elements becomes relevant (as a result of returning false for every comparison in the reduceLeft). "Left biasing" obviously solves this issue, leading to consistent results:

    BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
    BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
    BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).max = 3.0
    BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0
    

    Unfortunately, I'm still not able to fully answer all questions here. Some remaining points are:

    • Why is the Scala's default ordering defined the way it is? Currently handling of NaNs seems to be pretty flawed. A very dangerous detail of the Ordering.Double is that the compare function actually delegates to java.lang.Double.compare, while the comparison member are implemented based on the language's native comparisons. This obviously leads to inconsistent results, for instance:

      Ordering.Double.compare(0.0, Double.NaN) == -1     // indicating 0.0 < NaN
      Ordering.Double.lt     (0.0, Double.NaN) == false  // contradiction
      
    • What are potential drawbacks of the BiasedOrdering, apart from directly evaluating any contradicting comparison? A quick check on sorted gave the following results, which did not reveal any trouble:

      Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
      OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
      BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
      
      Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
      OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
      BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
      

    For the time being I'll have a go with this left biased ordering. But since the nature of the problem does not allow a flawless general solution: use with care!

    Update

    And in terms of solutions based on an implicit class as monkjack suggested, I like the following a lot (since it does not mess with (flawed?) total orders at all, but internally converts to a clean totally ordered domain):

    implicit class MinMaxNanAware(t: TraversableOnce[Double]) {
      def nanAwareMin = t.minBy(x => if (x.isNaN) Double.PositiveInfinity else x)
      def nanAwareMax = t.maxBy(x => if (x.isNaN) Double.NegativeInfinity else x)
    }
    
    // and now we can simply use
    val goodMin = list.nanAwareMin
    

提交回复
热议问题