问题
I would like to hard threshold my matrix such that all values below a certain number are set to zero. However, I would like that threshold to vary by the column (i.e. each column has its own threshold). How can I do this in R?
Here is the simple set up:
set.seed(1)
A <- matrix(runif(n = 12),nrow = 4)
# [,1] [,2] [,3]
#[1,] 0.2655087 0.2016819 0.62911404
#[2,] 0.3721239 0.8983897 0.06178627
#[3,] 0.5728534 0.9446753 0.20597457
#[4,] 0.9082078 0.6607978 0.17655675
threshholds <- c(0.3,1,0.5)
#wanted result:
# [,1] [,2] [,3]
#[1,] 0 0 0.62911404
#[2,] 0.3721239 0 0
#[3,] 0.5728534 0 0
#[4,] 0.9082078 0 0
I need to apply it to large matrices, so efficiency is relevant.
Edit: Having received several excellent suggestions, I compared their speed for future reference:
set.seed(1)
A <- matrix(runif(n = 1E4*2E3),nrow = 2E3)
threshholds <- runif(n=1E4)
> system.time(A * (A > threshholds[col(A)]))# akrun
user system elapsed
0.394 0.124 0.519
> system.time(replace(A, A <= threshholds[col(A)], 0)) # akrun
user system elapsed
0.465 0.138 0.604
> system.time(pmin(A, A > threshholds[col(A)])) #akrun
user system elapsed
0.678 0.290 1.024
> system.time(A[t(apply(A, 1, `<`, threshholds))] <- 0) #Andrew Gustar
user system elapsed
0.875 0.306 1.200
> system.time(At <- apply(A, 1, applythresh)) + system.time(t(At)) #Chris Litter
user system elapsed
0.891 0.372 1.286
> system.time(sweep(A, 2, threshholds, function(a,b) {ifelse(a<b,0,a)})) #MrFlick
user system elapsed
1.752 0.598 2.354
回答1:
Here is a vectorized option
replace(A, A <= threshholds[col(A)], 0)
Or with some arithmetic
A * (A > threshholds[col(A)])
# [,1] [,2] [,3]
#[1,] 0.0000000 0 0.629114
#[2,] 0.3721239 0 0.000000
#[3,] 0.5728534 0 0.000000
#[4,] 0.9082078 0 0.000000
Or with pmin
pmin(A, A > threshholds[col(A)])
# [,1] [,2] [,3]
#[1,] 0.0000000 0 0.629114
#[2,] 0.3721239 0 0.000000
#[3,] 0.5728534 0 0.000000
#[4,] 0.9082078 0 0.000000
回答2:
You can use the sweep
command for this. For example
threshholds <- c(0.3,1,0.5)
sweep(A, 2, threshholds, function(a,b) {ifelse(a<b,0,a)})
# [,1] [,2] [,3]
# [1,] 0.0000000 0 0.629114
# [2,] 0.3721239 0 0.000000
# [3,] 0.5728534 0 0.000000
# [4,] 0.9082078 0 0.000000
Here we apply our function to each of the different columns using a different threshold for each column.
回答3:
Let me know how this fairs over your full matrix. Though having seen somebody has a built in function solution, I may be too slow.
applythresh <- function(x){
x <- x * (x >= threshholds)
}
At <- apply(A, 1, applythresh)
t(At)
回答4:
Here is another approach...
A[t(apply(A, 1, `<`, threshholds))] <- 0
A
[,1] [,2] [,3]
[1,] 0.0000000 0 0.629114
[2,] 0.3721239 0 0.000000
[3,] 0.5728534 0 0.000000
[4,] 0.9082078 0 0.000000
来源:https://stackoverflow.com/questions/50881398/different-hard-threshold-for-each-column