I have a vector and I need to sum every n
numbers and return the results. This is the way I plan on doing it currently. Any better way to do this?
One way is to convert your vector to a matric then take the column sums:
colSums(matrix(v, nrow=n))
[1] 55 155 255 355 455 555 655 755 855 955
Just be careful: this implicitly assumes that your input vector can in fact be reshaped to a matrix. If it can't, R will recycle elements of your vector to complete the matrix.
I will add one more way of doing it without any function from apply
family
v <- 1:100
n <- 10
diff(c(0, cumsum(v)[slice.index(v, 1)%%n == 0]))
## [1] 55 155 255 355 455 555 655 755 855 955
If you want to sum every n consecutive numbers use colSums
If you want to sum every nth number use rowSums
as per Josh's comment, this will only work if n
divides length(v)
nicely.
rowSums(matrix(v, nrow=n))
[1] 460 470 480 490 500 510 520 530 540 550
colSums(matrix(v, nrow=n))
[1] 55 155 255 355 455 555 655 755 855 955
The olde version don't work. Here a ne awnser that use rep
to create the grouping factor. No need to use cut
:
n <- 5
vv <- sample(1:1000,100)
seqs <- seq_along(vv)
tapply(vv,rep(seqs,each=n)[seqs],FUN=sum)
You can use tapply
tapply(1:100,cut(1:100,10),FUN=sum)
or to get a list
by(1:100,cut(1:100,10),FUN=sum)
EDIT
In case you have 1:92
, you can replace your cut by this :
cut(1:92,seq(1,92,10),include.lowest=T)
Here are some of the main variants offered so far
f0 <- function(v, n) {
sidx = seq.int(from=1, to=length(v), by=n)
eidx = c((sidx-1)[2:length(sidx)], length(v))
sapply(1:length(sidx), function(i) sum(v[sidx[i]:eidx[i]]))
}
f1 <- function(v, n, na.rm=TRUE) { # 'tapply'
unname(tapply(v, (seq_along(v)-1) %/% n, sum, na.rm=na.rm))
}
f2 <- function(v, n, na.rm=TRUE) { # 'matrix'
nv <- length(v)
if (nv %% n)
v[ceiling(nv / n) * n] <- NA
colSums(matrix(v, n), na.rm=na.rm)
}
f3 <- function(v, n) { # 'cumsum'
nv = length(v)
i <- c(seq_len(nv %/% n) * n, if (nv %% n) nv else NULL)
diff(c(0L, cumsum(v)[i]))
}
Basic test cases might be
v = list(1:4, 1:5, c(NA, 2:4), integer())
n = 2
f0
fails with the final test, but this could probably be fixed
> f0(integer(), n)
Error in sidx[i]:eidx[i] : NA/NaN argument
The cumsum approach f3
is subject to rounding error, and the presence of an NA early in v
'poisons' later results
> f3(c(NA, 2:4), n)
[1] NA NA
In terms of performance, the original solution is not bad
> library(rbenchmark)
> cols <- c("test", "elapsed", "relative")
> v <- 1:100; n <- 10
> benchmark(f0(v, n), f1(v, n), f2(v, n), f3(v, n),
+ columns=cols)
test elapsed relative
1 f0(v, n) 0.012 3.00
2 f1(v, n) 0.065 16.25
3 f2(v, n) 0.004 1.00
4 f3(v, n) 0.004 1.00
but the matrix solution f2
seems to be both fast and flexible (e.g., adjusting the handling of that trailing chunk of fewer than n
elements)
> v <- runif(1e6); n <- 10
> benchmark(f0(v, n), f2(v, n), f3(v, n), columns=cols, replications=10)
test elapsed relative
1 f0(v, n) 5.804 34.141
2 f2(v, n) 0.170 1.000
3 f3(v, n) 0.251 1.476
unname(tapply(v, (seq_along(v)-1) %/% n, sum))
# [1] 55 155 255 355 455 555 655 755 855 955