Standard Deviation in R Seems to be Returning the Wrong Answer - Am I Doing Something Wrong?

后端 未结 4 1925
迷失自我
迷失自我 2020-11-29 06:18

A simple example of calculating standard dev:

d <- c(2,4,4,4,5,5,7,9)
sd(d)

yields

[1] 2.13809

but

相关标签:
4条回答
  • 2020-11-29 06:26

    Try this

    R> sd(c(2,4,4,4,5,5,7,9)) * sqrt(7/8)
    [1] 2
    R> 
    

    and see the rest of the Wikipedia article for the discussion about estimation of standard deviations. Using the formula employed 'by hand' leads to a biased estimate, hence the correction of sqrt((N-1)/N). Here is a key quote:

    The term standard deviation of the sample is used for the uncorrected estimator (using N) while the term sample standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees of freedom in the vector of residuals, .

    0 讨论(0)
  • 2020-11-29 06:26

    Looks like R is assuming (n-1) in the denominator, not n.

    0 讨论(0)
  • 2020-11-29 06:30

    When I want the population variance or standard deviation (n as denominator), I define these two vectorized functions.

      pop.var <- function(x) var(x) * (length(x)-1) / length(x)
    
      pop.sd <- function(x) sqrt(pop.var(x))
    

    BTW, Khan Academy has a good discussion of population and sample standard deviation here.

    0 讨论(0)
  • 2020-11-29 06:46

    Note that running the command

    ?sd 
    

    in R Studio displays the help page for the function. In the details section it states

    Like var this uses denominator n - 1.

    0 讨论(0)
提交回复
热议问题