Stats on every n rows for each column

前端 未结 2 1982
轻奢々
轻奢々 2021-01-23 04:03

I would like to calculate the mean and standard deviation for every nth (in my case every 6) rows (or samples). The following function gives me the means for every 6 rows (96 ro

相关标签:
2条回答
  • 2021-01-23 04:50

    You can apply the function to each column with sapply:

    sapply(iris[1:4], function(x) colMeans(matrix(x, nrow=6)))
          Sepal.Length Sepal.Width Petal.Length Petal.Width
     [1,]     4.950000    3.383333     1.450000   0.2333333
     [2,]     4.850000    3.316667     1.483333   0.2000000
     [3,]     5.183333    3.633333     1.316667   0.2500000
    

    ...

    [23,]     6.533333    2.950000     5.583333   1.9333333
    [24,]     6.516667    3.033333     5.316667   2.1333333
    [25,]     6.383333    3.033333     5.266667   2.1333333
    

    Compare with creating the means of the first six rows manually:

    colMeans(iris[1:6, 1:4])
    Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       4.9500000    3.3833333    1.4500000    0.2333333 
    

    You can also do this with aggregate given the proper by argument:

    aggregate(iris[1:4], by=list((seq(nrow(iris))-1) %/% 6), FUN=mean)
       Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
    1        0     4.950000    3.383333     1.450000   0.2333333
    2        1     4.850000    3.316667     1.483333   0.2000000
    3        2     5.183333    3.633333     1.316667   0.2500000
    

    ...

    This works by creating a vector which identifies the groups to be averaged:

    (seq(nrow(iris))-1) %/% 6
      [1]  0  0  0  0  0  0  1  1  1  1  1  1  2  2  2  2  2  2  3  3  3  3  3  3  4  4  4  4  4  4  5  5  5  5  5  5  6  6  6  6  6  6  7  7  7  7  7  7  8  8  8  8
     [53]  8  8  9  9  9  9  9  9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17
    [105] 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24
    

    The sapply solution returns a matrix, whereas the aggregate solution returns a data frame, in case one is more desirable.

    0 讨论(0)
  • 2021-01-23 04:57

    I think a possible reason that you got Error, warning message is because you applied it directly on the data.frame. For example

    set.seed(48)
    d1 <- as.data.frame(matrix(sample(1:40, 80*96, replace=T), ncol=80))
    rowMeans(matrix(d1, ncol=6, byrow=T))
    #Error in rowMeans(matrix(d1, ncol = 6, byrow = T)) : 'x' must be numeric
    #In addition: Warning message:
    #In matrix(d1, ncol = 6, byrow = T) :
    #  data length [80] is not a sub-multiple or multiple of the number of rows [14]
    

    You could unlist the data.frame

     res <- rowMeans(matrix(unlist(d1), ncol=6, byrow=T))
     dim(res) <- c(96/6, 80)
    length(res)
    #[1] 1280
    

    Crosschecking the results from @Matthew Lundberg's method

    res1 <- sapply(d1, function(x) colMeans(matrix(x, nrow=6)))
    
    all.equal(res,res1, check.attributes=F)
    [1] TRUE
    
    0 讨论(0)
提交回复
热议问题