I\'m trying to implement the calculation of correlation coefficient of people between two sets of data in php. I\'m just trying to do the porting python script that can be f
Your algorithm looks mathematically correct but numerically unstable. Finding the sum of squares explicitly is a recipe for disaster. What if you have numbers like array(10000000001, 10000000002, 10000000003)
? A numerically stable one-pass algorithm for calculating the variance can be found on Wikipedia, and the same principle can be applied to computing the covariance.
Easier yet, if you don't care much about speed, you could just use two passes. Find the means in the first pass, then compute the variances and covariances using the textbook formula in the second pass.
try my package here
http://www.phpclasses.org/browse/package/5854.html
This is my solution:
function php_correlation($x,$y){
if(count($x)!==count($y)){return -1;}
$x=array_values($x);
$y=array_values($y);
$xs=array_sum($x)/count($x);
$ys=array_sum($y)/count($y);
$a=0;$bx=0;$by=0;
for($i=0;$i<count($x);$i++){
$xr=$x[$i]-$xs;
$yr=$y[$i]-$ys;
$a+=$xr*$yr;
$bx+=pow($xr,2);
$by+=pow($yr,2);
}
$b = sqrt($bx*$by);
if($b==0) return 0;
return $a/$b;
}
http://profprog.ru/korrelyaciya-na-php-php-simple-pearson-correlation/