I am trying to work out for each row of a matrix how many columns have values greater than a specified value. I am sorry that I am asking this simple question but I wasn\'t
The third argument of apply needs to be a function. Also, you can count logical trues with sum.
apply(data, 1, function(x)sum(x > 30))
This will give you the vector you are looking for:
rowSums(data > 30)
It will work whether data
is a matrix or a data.frame. Also, it uses vectorized functions, hence is a preferred approach over using apply
which is little more than a (slow) for loop.
If data
is a data.frame, you can add the result as a column by doing:
data$yr.above <- rowSums(data > 30)
or if data
is a matrix:
data <- cbind(data, yr.above = rowSums(data > 30))
You can also create a whole new data.frame:
data.frame(yr.above = rowSums(data > 30))
or a whole new matrix:
cbind(yr.above = rowSums(data > 30))
We can also do with Reduce
and +
(assuming there are no NA elements)
Reduce(`+`, lapply(as.data.frame(data), `>`, 30))
This should be efficient as we are not converting to a matrix
.
With dplyr
package, you can try the following two solutions.
library(dplyr)
df <- as.data.frame(data)
Options 1
df %>%
mutate(yr.above = rowSums(select(df, `1990`:`1992`) > 30))
Options 2
After dplyr 1.0.0
, you can use c_across()
with rowwise()
to make it easy to perform row-wise aggregations.
df %>%
rowwise() %>%
mutate(yr.above = sum(c_across(`1990`:`1992`) > 30)) %>%
ungroup()
Note: One of the benefits for using dplyr
is the support of tidy selections, which provide a concise dialect of R for selecting variables based on their names or properties.
Output
# # A tibble: 5 x 4
# `1990` `1991` `1992` yr.above
# <dbl> <dbl> <dbl> <int>
# 1 25 23 20 0
# 2 22 28 20 0
# 3 35 33 30 2
# 4 42 40 41 3
# 5 44 45 43 3