I\'ve just started with R and I\'ve executed these statements:
library(datasets)
head(airquality)
s <- split(airquality,airquality$Month)
sapply(s, function(x
They are not supposed to give the same result. Consider this example:
exdf<-data.frame(a=c(1,NA,5),b=c(3,2,2))
# a b
#1 1 3
#2 NA 2
#3 5 2
colMeans(exdf,na.rm=TRUE)
# a b
#3.000000 2.333333
colMeans(na.omit(exdf))
# a b
#3.0 2.5
Why is this? In the first case, the mean of column b
is calculated through (3+2+2)/3
. In the second case, the second row is removed in its entirety (also the value of b
which is not-NA and therefore considered in the first case) by na.omit
and so the b
mean is just (3+2)/2
.