问题
I am trying to optimize some code that I have written as it is very slow for large datasets. I am not sure if the following can be done with matrix operations and I would appreciate if someone had any suggestions to make it faster.
I have a matrix with zeros and integers and I would like to shift down the entries of the individual columns by the absolute number of the integer in the the entry.
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 -4 0
[3,] 4 0 0
[4,] -3 -2 0
[5,] 0 2 -1
[6,] 2 -2 0
[7,] 0 0 0
[8,] -3 -3 0
The code I am using is the following:
#data
A<-matrix(data=c(0,0,4,-3,0,2,0,-3,0,-4,0,-2,2,-2,0,-3,0,0,0,0,-1,0,0,0),nrow=8,ncol=3)
#shift function
shift<-function(x)
{
#create the output matrix
out<-matrix(data=0,nrow=8,ncol=1)
#for loop to create the shift matrix
for(i in seq(1,8,by=1))
{
if(i+abs(x[i])<=8)
{
#find the non zero
if(x[i]!=0)
{
#if there is already a number put zero
if(out[i+abs(x[i]),1]!=0)
{
out[i+abs(x[i]),1]=0
} else {
#shift
out[i+abs(x[i]),1]=x[i]
}
}
}
}
#return object
return(out)
}
#run the logic
shift_mat<-sapply(1:ncol(A),FUN=function(k) shift(A[,k]))
and the result is:
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0
[5,] 0 0 0
[6,] 0 0 -1
[7,] 0 2 0
[8,] 2 -2 0
The rules are the following for every column:
- starting from the top find first entry that is different than zero
- shift down by the absolute numbers of that entry
- if there is another entry at the targeted point put zero
- repeat for the next column
Thanks,
Nikos
回答1:
This is a bit cleaner and about 40% faster using your example on my machine. Maybe the speed improvement will be greater using your larger data?
You should use a matrix of integers. It uses less memory and some operations are faster:
A <- matrix(as.integer(c(0,0,4,-3,0,2,0,-3,0,-4,0,-2,2,
-2,0,-3,0,0,0,0,-1,0,0,0)), nrow = 8, ncol = 3)
Each column is a vector, so should be your output. I replaced matrices with vectors. Also made your code more robust without the hardcoded number of rows:
shift <- function(x) {
n <- length(x)
y <- rep(0L, n)
for(i in seq_len(n)) {
if (x[i] == 0L) next
j <- i + abs(x[i])
if (j > n) next
y[j] <- if (y[j] != 0L) 0L else x[i]
}
return(y)
}
You can run it using apply
:
shift_mat <- apply(A, 2, shift)
回答2:
The shift operation can be vectorized. Let's just take the first column of your data to see how:
v = c(0,0,4,-3,0,2,0,-3)
# index of the elements that could be non-zero in the final result
index = ifelse (v != 0 & abs(v) + seq_along(v) <= length(v),
abs(v) + seq_along(v), 0)
# [1] 0 0 7 7 0 8 0 0
# now just need to filter out the duplicated entries
index = ave(index, index, FUN = function(x) {if (length(x) > 1) 0 else x})
# [1] 0 0 0 0 0 8 0 0
# home at last
res = integer(length(v))
res[index] = v[which(index != 0)]
res
# [1] 0 0 0 0 0 0 0 2
You can then put then above into a function and then lapply
over your data.frame
or apply
on the columns of your matrix.
Unsurprisingly the biggest bottleneck above is the ave
function, and you can replace that line with the following data.table
construct (don't forget to require(data.table)
somewhere) to speed it up considerably:
index = data.table(index)[, index := if(.N > 1) 0 else index, by = index][, index]
来源:https://stackoverflow.com/questions/18281936/r-matrix-individual-shift-operations-of-elements