I have several records with a given attribute, and I want to find the standard deviation.
How do I do that?
In case people are using postgres ... it provides aggregate functions for stddev_pop and stddev_samp - postgresql aggregate functions
stddev (equiv of stddev_samp) available since at least postgres 7.1, since 8.2 both samp and pop are provided.
The answer given above is elegant but has a slight error in it. Not being a stats head myself I sat up and read in detail a number of websites and found this one gave the most comprehensible explanation of how to derive a standard deviation. http://sonia.hubpages.com/hub/stddev
The error in the answer above is in the sample_variance
method.
Here is my corrected version, along with a simple unit test that shows it works.
in ./lib/enumerable/standard_deviation.rb
#!usr/bin/ruby
module Enumerable
def sum
return self.inject(0){|accum, i| accum + i }
end
def mean
return self.sum / self.length.to_f
end
def sample_variance
m = self.mean
sum = self.inject(0){|accum, i| accum + (i - m) ** 2 }
return sum / (self.length - 1).to_f
end
def standard_deviation
return Math.sqrt(self.sample_variance)
end
end
in ./test
using numbers derived from a simple spreadsheet.
#!usr/bin/ruby
require 'enumerable/standard_deviation'
class StandardDeviationTest < Test::Unit::TestCase
THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5]
def test_sum
expected = 16.5
result = THE_NUMBERS.sum
assert result == expected, "expected #{expected} but got #{result}"
end
def test_mean
expected = 2.75
result = THE_NUMBERS.mean
assert result == expected, "expected #{expected} but got #{result}"
end
def test_sample_variance
expected = 2.151
result = THE_NUMBERS.sample_variance
assert result == expected, "expected #{expected} but got #{result}"
end
def test_standard_deviation
expected = 1.4666287874
result = THE_NUMBERS.standard_deviation
assert result.round(10) == expected, "expected #{expected} but got #{result}"
end
end
The presented computation are not very efficient because they require several (at least two, but often three because you usually want to present average in addition to std-dev) passes through the array.
I know Ruby is not the place to look for efficiency, but here is my implementation that computes average and standard deviation with a single pass over the list values:
module Enumerable
def avg_stddev
return nil unless count > 0
return [ first, 0 ] if count == 1
sx = sx2 = 0
each do |x|
sx2 += x**2
sx += x
end
[
sx.to_f / count,
Math.sqrt( # http://wijmo.com/docs/spreadjs/STDEV.html
(sx2 - sx**2.0/count)
/
(count - 1)
)
]
end
end