Correlation of two arrays in C#

前端 未结 6 954
挽巷
挽巷 2020-11-29 05:09

Having two arrays of double values, I want to compute correlation coefficient (single double value, just like the CORREL function in MS Excel). Is there some simple one-line

相关标签:
6条回答
  • You can have the values in separate lists at the same index and use a simple Zip.

    var fitResult = new FitResult();
    var values1 = new List<int>();
    var values2 = new List<int>();
    
    var correls = values1.Zip(values2, (v1, v2) =>
                                           fitResult.CorrelationCoefficient(v1, v2));
    

    A second way is to write your own custom implementation (mine isn't optimized for speed):

    public double ComputeCoeff(double[] values1, double[] values2)
    {
        if(values1.Length != values2.Length)
            throw new ArgumentException("values must be the same length");
    
        var avg1 = values1.Average();
        var avg2 = values2.Average();
    
        var sum1 = values1.Zip(values2, (x1, y1) => (x1 - avg1) * (y1 - avg2)).Sum();
    
        var sumSqr1 = values1.Sum(x => Math.Pow((x - avg1), 2.0));
        var sumSqr2 = values2.Sum(y => Math.Pow((y - avg2), 2.0));
    
        var result = sum1 / Math.Sqrt(sumSqr1 * sumSqr2);
    
        return result;
    }
    

    Usage:

    var values1 = new List<double> { 3, 2, 4, 5 ,6 };
    var values2 = new List<double> { 9, 7, 12 ,15, 17 };
    
    var result = ComputeCoeff(values1.ToArray(), values2.ToArray());
    // 0.997054485501581
    
    Debug.Assert(result.ToString("F6") == "0.997054");
    

    Another way is to use the Excel function directly:

    var values1 = new List<double> { 3, 2, 4, 5 ,6 };
    var values2 = new List<double> { 9, 7, 12 ,15, 17 };
    
    // Make sure to add a reference to Microsoft.Office.Interop.Excel.dll
    // and use the namespace
    
    var application = new Application();
    
    var worksheetFunction = application.WorksheetFunction;
    
    var result = worksheetFunction.Correl(values1.ToArray(), values2.ToArray());
    
    Console.Write(result); // 0.997054485501581
    
    0 讨论(0)
  • 2020-11-29 05:51

    Math.NET Numerics is a well-documented math library that contains a Correlation class. It calculates Pearson and Spearman ranked correlations: http://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Correlation.htm

    The library is available under the very liberal MIT/X11 license. Using it to calculate a correlation coefficient is as easy as follows:

    using MathNet.Numerics.Statistics;
    
    ...
    
    correlation = Correlation.Pearson(arrayOfValues1, arrayOfValues2);
    

    Good luck!

    0 讨论(0)
  • 2020-11-29 05:59
    Public Function Correlation(ByRef array1() As Double, ByRef array2() As Double) As Double
        'siehe https://stackoverflow.com/questions/17447817/correlation-of-two-arrays-in-c-sharp
    
        'der hier errechnete "Pearson correlation coefficient" muss noch quadriert werden, um R-Squared zu erhalten, siehe
        'https://en.wikipedia.org/wiki/Coefficient_of_determination
    
    
        Dim array_xy(array1.Length - 1) As Double
        Dim array_xp2(array1.Length - 1) As Double
        Dim array_yp2(array1.Length - 1) As Double
    
        Dim i As Integer
        For i = 0 To array1.Length - 1
            array_xy(i) = array1(i) * array2(i)
        Next i
        For i = 0 To array1.Length - 1
            array_xp2(i) = Math.Pow(array1(i), 2.0)
        Next i
        For i = 0 To array1.Length - 1
            array_yp2(i) = Math.Pow(array2(i), 2.0)
        Next i
    
    
        Dim sum_x As Double = 0
        Dim sum_y As Double = 0
        Dim EinDouble As Double
    
        For Each EinDouble In array1
            sum_x += EinDouble
        Next
        For Each EinDouble In array2
            sum_y += EinDouble
        Next
    
        Dim sum_xy As Double = 0
        For Each EinDouble In array_xy
            sum_xy += EinDouble
        Next
    
        Dim sum_xpow2 As Double = 0
        For Each EinDouble In array_xp2
            sum_xpow2 += EinDouble
        Next
    
        Dim sum_ypow2 As Double = 0
        For Each EinDouble In array_yp2
            sum_ypow2 += EinDouble
        Next
    
        Dim Ex2 As Double = Math.Pow(sum_x, 2.0)
        Dim Ey2 As Double = Math.Pow(sum_y, 2.0)
    
        Dim ReturnWert As Double
        ReturnWert = (array1.Length * sum_xy - sum_x * sum_y) / Math.Sqrt((array1.Length * sum_xpow2 - Ex2) * (array1.Length * sum_ypow2 - Ey2))
        Correlation = ReturnWert
    End Function
    
    0 讨论(0)
  • 2020-11-29 06:08

    In order to calculate Pearson product-moment correlation coefficient

    http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

    You can use this simple code:

      public static Double Correlation(Double[] Xs, Double[] Ys) {
        Double sumX = 0;
        Double sumX2 = 0;
        Double sumY = 0;
        Double sumY2 = 0;
        Double sumXY = 0;
    
        int n = Xs.Length < Ys.Length ? Xs.Length : Ys.Length;
    
        for (int i = 0; i < n; ++i) {
          Double x = Xs[i];
          Double y = Ys[i];
    
          sumX += x;
          sumX2 += x * x;
          sumY += y;
          sumY2 += y * y;
          sumXY += x * y;
        }
    
        Double stdX = Math.Sqrt(sumX2 / n - sumX * sumX / n / n);
        Double stdY = Math.Sqrt(sumY2 / n - sumY * sumY / n / n);
        Double covariance = (sumXY / n - sumX * sumY / n / n);
    
        return covariance / stdX / stdY; 
      }
    
    0 讨论(0)
  • 2020-11-29 06:12

    If you don't want to use a third party library, you can use the method from this post (posting code here for backup).

    public double Correlation(double[] array1, double[] array2)
    {
        double[] array_xy = new double[array1.Length];
        double[] array_xp2 = new double[array1.Length];
        double[] array_yp2 = new double[array1.Length];
        for (int i = 0; i < array1.Length; i++)
        array_xy[i] = array1[i] * array2[i];
        for (int i = 0; i < array1.Length; i++)
        array_xp2[i] = Math.Pow(array1[i], 2.0);
        for (int i = 0; i < array1.Length; i++)
        array_yp2[i] = Math.Pow(array2[i], 2.0);
        double sum_x = 0;
        double sum_y = 0;
        foreach (double n in array1)
            sum_x += n;
        foreach (double n in array2)
            sum_y += n;
        double sum_xy = 0;
        foreach (double n in array_xy)
            sum_xy += n;
        double sum_xpow2 = 0;
        foreach (double n in array_xp2)
            sum_xpow2 += n;
        double sum_ypow2 = 0;
        foreach (double n in array_yp2)
            sum_ypow2 += n;
        double Ex2 = Math.Pow(sum_x, 2.00);
        double Ey2 = Math.Pow(sum_y, 2.00);
    
        return (array1.Length * sum_xy - sum_x * sum_y) /
               Math.Sqrt((array1.Length * sum_xpow2 - Ex2) * (array1.Length * sum_ypow2 - Ey2));
    }
    
    0 讨论(0)
  • 2020-11-29 06:12

    In my tests, both @Dmitry Bychenko's and @keyboardP's code postings above resulted in generally the same correlations as Microsoft Excel over a handful of manual tests I did, and did not need any external libraries.

    e.g. Running this once (data for this run listed at the bottom):

    @Dmitry Bychenko: -0.00418479432051121

    @keyboardP:______-0.00418479432051131

    MS Excel:_________-0.004184794

    Here is a test harness:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    namespace TestCorrel {
        class Program {
    
            static void Main(string[] args) {
    
                Random rand = new Random(DateTime.Now.Millisecond);
    
                List<double> x = new List<double>();
                List<double> y = new List<double>();
    
                for (int i = 0; i < 100; i++) {
    
                    x.Add(rand.Next(1000) * rand.NextDouble());
                    y.Add(rand.Next(1000) * rand.NextDouble());
    
                    Console.WriteLine(x[i] + "," + y[i]);
                }
    
                Console.WriteLine("Correl1: " + Correl1(x, y));
                Console.WriteLine("Correl2: " + Correl2(x, y));
            }
    
            public static double Correl1(List<double> x, List<double> y) {
    
                //https://stackoverflow.com/questions/17447817/correlation-of-two-arrays-in-c-sharp
                if (x.Count != y.Count)
                    return (double.NaN); //throw new ArgumentException("values must be the same length");
    
                double sumX = 0;
                double sumX2 = 0;
                double sumY = 0;
                double sumY2 = 0;
                double sumXY = 0;
    
                int n = x.Count < y.Count ? x.Count : y.Count;
    
                for (int i = 0; i < n; ++i) {
    
                    Double xval = x[i];
                    Double yval = y[i];
    
                    sumX += xval;
                    sumX2 += xval * xval;
                    sumY += yval;
                    sumY2 += yval * yval;
                    sumXY += xval * yval;
                }
    
                Double stdX = Math.Sqrt(sumX2 / n - sumX * sumX / n / n);
                Double stdY = Math.Sqrt(sumY2 / n - sumY * sumY / n / n);
                Double covariance = (sumXY / n - sumX * sumY / n / n);
    
                return covariance / stdX / stdY;
            }
    
            public static double Correl2(List<double> x, List<double> y) {
    
                double[] array_xy = new double[x.Count];
                double[] array_xp2 = new double[x.Count];
                double[] array_yp2 = new double[x.Count];
    
                for (int i = 0; i < x.Count; i++)
                    array_xy[i] = x[i] * y[i];
                for (int i = 0; i < x.Count; i++)
                    array_xp2[i] = Math.Pow(x[i], 2.0);
                for (int i = 0; i < x.Count; i++)
                    array_yp2[i] = Math.Pow(y[i], 2.0);
                double sum_x = 0;
                double sum_y = 0;
                foreach (double n in x)
                    sum_x += n;
                foreach (double n in y)
                    sum_y += n;
                double sum_xy = 0;
                foreach (double n in array_xy)
                    sum_xy += n;
                double sum_xpow2 = 0;
                foreach (double n in array_xp2)
                    sum_xpow2 += n;
                double sum_ypow2 = 0;
                foreach (double n in array_yp2)
                    sum_ypow2 += n;
                double Ex2 = Math.Pow(sum_x, 2.00);
                double Ey2 = Math.Pow(sum_y, 2.00);
    
                double Correl = 
                (x.Count * sum_xy - sum_x * sum_y) /
                Math.Sqrt((x.Count * sum_xpow2 - Ex2) * (x.Count * sum_ypow2 - Ey2));
    
                return (Correl);
            }
        }
    }
    

    Data for the example numbers above:

    287.688269702572,225.610842817282
    618.9313498167,177.955550192835
    25.7778882802361,27.6549569366756
    140.847984766051,714.618547504125
    438.618761728806,533.48764902702
    481.347431274758,214.381256273194
    21.6406916848573,393.559209519792
    135.30397563209,158.419851317732
    334.314685154853,814.275162949821
    764.614904770914,50.1435267264692
    42.8179292282173,47.8631582287434
    237.216836650491,370.488416981179
    388.849658539449,134.961087643151
    305.903013161804,441.926902444068
    10.6625048679591,369.567569480076
    36.9316453891488,24.8947204607049
    2.10067253471383,491.941975629861
    7.94887068492774,573.037801189831
    341.738006353722,653.497146697015
    98.8424873439793,475.215988045193
    272.248712629196,36.1088809138671
    122.336823399801,169.158256422336
    9.32281673202422,631.076001565473
    201.118425176068,803.724831627554
    415.514343714115,64.248651454341
    227.791637123,230.512133914284
    25.3438658925443,396.854282886188
    596.238994411304,72.543763144195
    230.239735877253,933.983901697669
    796.060099040186,689.952468971234
    9.30882684202344,269.22063744125
    16.5005430148451,8.96549091859045
    536.324005148524,358.829873788557
    519.694526420764,17.3212184707267
    552.628357889423,12.5541588051962
    210.516099897454,388.57537739937
    141.341571405689,268.082028986924
    503.880356335491,753.447006912645
    515.494990213539,444.451280259737
    973.8670776076,168.922799013985
    85.7111146094795,36.3784999169309
    37.2147129193017,108.040356312432
    504.590177939548,50.3934166889607
    482.821039277511,888.984586256083
    5.52549206350255,156.717087003271
    405.833169031345,394.099059180868
    459.249365587835,11.68776424494
    429.421127440604,314.216759666901
    126.908422469584,331.907062556551
    62.1416232716952,3.19765723645578
    4.16058817699579,604.04046284223
    484.262182311277,220.177370167886
    58.6774453314382,339.09660232677
    463.482149892246,199.181594849183
    344.128297473829,268.531428258182
    0.883430369609702,209.346384477963
    77.9462970131758,255.221325168955
    583.629439312792,235.557751925922
    358.409186083083,376.046612200349
    81.2148325150902,10.7696774717279
    53.7315618049966,274.171515094196
    111.284646992239,130.174321939319
    317.280491961763,338.077288461885
    177.454564264722,7.53587801919127
    69.2239431670047,233.693477620228
    823.419546454875,0.111916855029723
    23.7174749401014,200.989081544331
    44.9598299125022,102.633862571155
    74.1602278468945,292.485449988155
    130.11182449251,23.4682153367755
    243.088760058903,335.807090202722
    13.3974915991526,436.983231269281
    73.3900805168739,252.352352472186
    592.144630201228,92.3395205570103
    57.7306153447044,47.1416798900541
    522.649018382024,584.427794722108
    15.3662010204821,60.1693953262499
    16.8335716728277,851.401980430541
    33.9869734449251,0.930781653584345
    116.66608504982,146.126050951949
    92.8896130355492,711.765618208687
    317.91980889529,322.186540377413
    44.8574470732629,209.275617858058
    751.201537871362,37.935519233316
    161.817758424588,2.83156183493862
    531.64078452142,79.1750782491523
    114.803219681048,283.106988439852
    123.472725123853,154.125248027558
    89.9276725453919,63.4626924192825
    105.623296753328,111.234188702067
    435.72981759707,23.7058234576629
    259.324810619152,69.3535200857341
    719.885234421531,381.086239833891
    24.2674900099018,198.408173349876
    57.7761600361095,146.52277489124
    77.4594609157459,710.746080866431
    636.671781979814,538.894185951396
    56.6035279932448,58.2563265684323
    485.16099039333,427.849954283261
    91.9552873247095,576.92944263617
    
    0 讨论(0)
提交回复
热议问题