Sorting an array of Doubles with NaN in it

后端 未结 5 1935
深忆病人
深忆病人 2021-01-11 10:26

This is more of a \'Can you explain this\' type of question than it is anything else.

I came across a problem at work where we were using NaN values in a table, but

相关标签:
5条回答
  • 2021-01-11 10:42

    Edit (conclusion. final. end.): This is a bug.

    See bug-report Bug in List<double/single>.Sort() [.NET35] in list which contains double.NaN and go give Hans Passant an up-vote at the Why does .NET 4.0 sort this array differently than .NET 3.5? from which I ripped the link.

    Historical musings

    [See the post: Why does .NET 4.0 sort this array differently than .NET 3.5?, where, hopefully, more useful discussion on this particular issue can be figured out for real. I have cross-posted this response there as well.]

    The behavior pointed out in .NET4 by Phil is that defined in CompareTo. See double.CompareTo for .NET4. This is the same behavior as in .NET35 however and should be consistent in both versions, per the method documentation...

    Array.Sort(double[]): doesn't seem to be using CompareTo(double[]) as expected and this may very well be a bug -- note the difference in Array.Sort(object[]) and Array.Sort(double[]) below. I would love clarification/corrections on the following.

    In any case, the answers using > and < and == explain why those operators don't work but fail to explain why Array.Sort leads to unexpected output. Here are some of my findings, as meager as they may be.

    First, the double.CompareTo(T) method documentation -- this ordering is well-defined according to the documentation:

    Less than zero: This instance is less than value. -or- This instance is not a number (NaN) and value is a number.

    Zero: This instance is equal to value. -or- Both this instance and value are not a number (NaN), PositiveInfinity, or NegativeInfinity.

    Greater than zero: This instance is greater than value. -or- This instance is a number and value is not a number (NaN).

    In LINQPad (3.5 and 4, both have same results):

    0d.CompareTo(0d).Dump();                  // 0
    double.NaN.CompareTo(0d).Dump();          // -1
    double.NaN.CompareTo(double.NaN).Dump();  // 0
    0d.CompareTo(double.NaN).Dump();          // 1
    

    Using CompareTo(object) has the same results:

    0d.CompareTo((object)0d).Dump();                  // 0
    double.NaN.CompareTo((object)0d).Dump();          // -1
    double.NaN.CompareTo((object)double.NaN).Dump();  // 0
    0d.CompareTo((object)double.NaN).Dump();          // 1
    

    So that's not the problem.

    Now, from the Array.Sort(object[]) documentation -- there is no use of >, < or == (according to the documentation) -- just CompareTo(object).

    Sorts the elements in an entire one-dimensional Array using the IComparable implementation of each element of the Array.

    Likewise, Array.Sort(T[]) uses CompareTo(T).

    Sorts the elements in an entire Array using the IComparable(Of T) generic interface implementation of each element of the Array.

    Let's see:

    LINQPad (4):

    var ar = new double[] {double.NaN, 0, 1, double.NaN};
    Array.Sort(ar);
    ar.Dump();
    // NaN, NaN, 0, 1
    

    LINQPad (3.5):

    var ar = new double[] {double.NaN, 0, 1, double.NaN};
    Array.Sort(ar);
    ar.Dump();
    // NaN, 0, NaN, 1
    

    LINQPad (3.5) -- NOTE THE ARRAY IS OF OBJECT and the behavior is "expected" per the CompareTo contract.

    var ar = new object[] {double.NaN, 0d, 1d, double.NaN};
    Array.Sort(ar);
    ar.Dump();
    // NaN, NaN, 0, 1
    

    Hmm. Really. In conclusion:

    I HAVE NO IDEA.

    Happy coding.

    0 讨论(0)
  • 2021-01-11 10:46

    since you are using the default sort which is QuickSort algorithm; the implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved

    0 讨论(0)
  • 2021-01-11 10:47

    I believe that's because

    a < NaN == false
    a > NaN == false
    a == NaN == false
    

    so the comparison on them breaks down, and that throws off the entire sort.

    0 讨论(0)
  • 2021-01-11 10:50

    Actually, the strange sorting behavior is the result of a bug in .NET 3.5. The bug was addressed with .NET 4.0.

    The only way to resolve it is to use your own custom comparer, or upgrade to .NET 4.0. See Why does .NET 4.0 sort this array differently than .NET 3.5?

    0 讨论(0)
  • 2021-01-11 10:58

    Conceptually NaN is Not a Number, so comparing to a number makes no sense, hence:

    a < NaN = false for all a,
    a > NaN = false for all a and
    NaN != NaN (!)
    

    To solve this you need to write your own comparer that uses IsNaN to make NaNs smaller than (or larger than) all numbers, so that they will all appear at one end of the sort.

    Edit: Here's a sample of the Comparison version:

    class Program
    {
        private static int NaNComparison(double first, double second)
        {
            if (double.IsNaN(first))
            {
                if (double.IsNaN(second)) // Throws an argument exception if we don't handle both being NaN
                    return 0;
                else
                    return -1;
            }
            if (double.IsNaN(second))
                return 1;
    
            if (first == second)
                return 0;
            return first < second ? -1 : 1;
        }
    
        static void Main(string[] args)
        {
            var doubles = new[] { double.NaN, 2.0, 3.0, 1.0, double.NaN };
    
            Array.Sort(doubles, NaNComparison);
        }
    }
    
    0 讨论(0)
提交回复
热议问题