Sum every nth points

前端 未结 9 957
耶瑟儿~
耶瑟儿~ 2020-11-29 04:42

I have a vector and I need to sum every n numbers and return the results. This is the way I plan on doing it currently. Any better way to do this?



        
相关标签:
9条回答
  • 2020-11-29 05:10

    One way is to convert your vector to a matric then take the column sums:

    colSums(matrix(v, nrow=n))
    [1]  55 155 255 355 455 555 655 755 855 955
    

    Just be careful: this implicitly assumes that your input vector can in fact be reshaped to a matrix. If it can't, R will recycle elements of your vector to complete the matrix.

    0 讨论(0)
  • 2020-11-29 05:12

    I will add one more way of doing it without any function from apply family

    v <- 1:100
    n <- 10
    
    diff(c(0, cumsum(v)[slice.index(v, 1)%%n == 0]))
    ##  [1]  55 155 255 355 455 555 655 755 855 955
    
    0 讨论(0)
  • 2020-11-29 05:13

    UPDATE:

    If you want to sum every n consecutive numbers use colSums
    If you want to sum every nth number use rowSums

    as per Josh's comment, this will only work if n divides length(v) nicely.

    rowSums(matrix(v, nrow=n))
     [1] 460 470 480 490 500 510 520 530 540 550
    
    colSums(matrix(v, nrow=n))
     [1]  55 155 255 355 455 555 655 755 855 955
    

    0 讨论(0)
  • 2020-11-29 05:17

    Update

    The olde version don't work. Here a ne awnser that use rep to create the grouping factor. No need to use cut:

    n <- 5 
    vv <- sample(1:1000,100)
    seqs <- seq_along(vv)
    tapply(vv,rep(seqs,each=n)[seqs],FUN=sum)
    

    You can use tapply

    tapply(1:100,cut(1:100,10),FUN=sum)
    

    or to get a list

    by(1:100,cut(1:100,10),FUN=sum)
    

    EDIT

    In case you have 1:92, you can replace your cut by this :

    cut(1:92,seq(1,92,10),include.lowest=T)
    
    0 讨论(0)
  • 2020-11-29 05:19

    Here are some of the main variants offered so far

    f0 <- function(v, n) {
        sidx = seq.int(from=1, to=length(v), by=n)
        eidx = c((sidx-1)[2:length(sidx)], length(v))
        sapply(1:length(sidx), function(i) sum(v[sidx[i]:eidx[i]]))
    }
    
    f1 <- function(v, n, na.rm=TRUE) {    # 'tapply'
        unname(tapply(v, (seq_along(v)-1) %/% n, sum, na.rm=na.rm))
    }
    
    f2 <- function(v, n, na.rm=TRUE) {    # 'matrix'
        nv <- length(v)
        if (nv %% n)
            v[ceiling(nv / n) * n] <- NA
        colSums(matrix(v, n), na.rm=na.rm)
    }
    
    f3 <- function(v, n) {                # 'cumsum'
        nv = length(v)
        i <- c(seq_len(nv %/% n) * n, if (nv %% n) nv else NULL)
        diff(c(0L, cumsum(v)[i]))
    }
    

    Basic test cases might be

    v = list(1:4, 1:5, c(NA, 2:4), integer())
    n = 2
    

    f0 fails with the final test, but this could probably be fixed

    > f0(integer(), n)
    Error in sidx[i]:eidx[i] : NA/NaN argument
    

    The cumsum approach f3 is subject to rounding error, and the presence of an NA early in v 'poisons' later results

    > f3(c(NA, 2:4), n)
    [1] NA NA
    

    In terms of performance, the original solution is not bad

    > library(rbenchmark)
    > cols <- c("test", "elapsed", "relative")
    > v <- 1:100; n <- 10
    > benchmark(f0(v, n), f1(v, n), f2(v, n), f3(v, n),
    +           columns=cols)
          test elapsed relative
    1 f0(v, n)   0.012     3.00
    2 f1(v, n)   0.065    16.25
    3 f2(v, n)   0.004     1.00
    4 f3(v, n)   0.004     1.00
    

    but the matrix solution f2 seems to be both fast and flexible (e.g., adjusting the handling of that trailing chunk of fewer than n elements)

    > v <- runif(1e6); n <- 10
    > benchmark(f0(v, n), f2(v, n), f3(v, n), columns=cols, replications=10)
          test elapsed relative
    1 f0(v, n)   5.804   34.141
    2 f2(v, n)   0.170    1.000
    3 f3(v, n)   0.251    1.476
    
    0 讨论(0)
  • 2020-11-29 05:23
    unname(tapply(v, (seq_along(v)-1) %/% n, sum))
    # [1] 55 155 255 355 455 555 655 755 855 955 
    
    0 讨论(0)
提交回复
热议问题