How to avoid multiple loops with multiple variables in R

怎甘沉沦 提交于 2019-12-12 01:53:08

问题


I have a two datasets stored in tables, one is a set of [a, b] and another is [x, Sx, y, Sy, rho]. I have a probability function f that requires (a, b, x, Sx, y, Sy, rho). In the end I want to find the sum of the probability results over all [x, Sx, y, Sy, rho] for the first [a, b]. Then find the sum for all [x, Sx, y, Sy, rho] over the second [a, b], etc...

I would like to have a few hundred rows in the [x, Sx, y, Sy, rho] file and a few hundred thousand rows in the [a, b] file.

I'm wondering if there is a way to do this without using two loops? I've tried the following, and it doesn't quite work the way I want it to, but I know it will be far too slow.

I don't know if it will help but I've added the function in the code. Sorry that the function itself is a mess and not formatted properly.

# data  file with (a, b)
data            <- matrix( c(1, 0, 1, 1, 0.5, 0), nrow=3, ncol=2) 
colnames(data)  <- c("a", "b") 
Ndat            <- dim(data)
Ndata           <- Ndat[1]

# data2 file with (x, Sx, y, Sy, rho)
data2           <- matrix( c(1, 0.1, 1, 0.1, 0.002, 2, 0.1, 2, 0.1, 0.000001, 
                             2, 0.1, 1, 0.1, 0.002), nrow=3, ncol=5) 
colnames(data2) <- c("x", "Sx", "y", "Sy", "rho") 
Ndat2           <- dim(data)
Ndata2          <- Ndat[1]

# function requires variables (a, b, s, Sx, y, Sy, rho) 
Prob  <- function(a, b, Xi, sX, Yi, sY, rho) {sqrt(1 + a ^ 2) * (
  exp(-((b + a * Xi - Yi) ^ 2 / (
    2 * ((a ^ 2 * sX ^ 2) - 
         (2 * a * rho * sX * sY) + sY ^ 2)))) * sqrt((
           1 - rho ^ 2) / (
             a ^ 2 * sX ^ 2 - 2 * a * rho *sX *sY + sY ^ 2))/(
               sqrt(2 * pi) * sqrt(1 - rho ^ 2)))
    }

# Here is my weak attempt
Table <- NULL
Table <- for (j in 1:Ndata) { 
   sum (for (i in 1:Ndata2) {
   Datatable[i] = Prob(data[j, a], data[j, b], data2[i, x], 
                 data2[i, Sx], data2[i, y], data2[i, Sy], 
                 data2[i, rho])
   })
}

I am having a very hard time wrapping my head around the apply functions and when they can/should be used. I know that I've probably not added enough information, so any suggestions that can help me out would be great. I'm pretty new to programming as well as R, so please forgive any inappropriate vocabulary or formatting.

There is probably a better way to define the number or rows in data to get Ndata as a global, but these are the first I stumbled across.

The function should not be recursive, but I see now that it is as I've written it. I have spent many hours on intro tutorials to R and still am having a very hard time understanding how the apply suite of functions are best implemented.

I would like one iteration to apply this function to each row in data2 using a, b from the first row of data. Then sum the probability for all of those. Then the next iteration should sum all of the probabilities for row 2 of data using a, b applied to every row of data2


回答1:


I have a feeling there's an easier way to do this, but something like this will probably work.

f <- function(a,b,x,y,z) a+b+x+y+z
f.new <- function(p1,p2) {
  p1=as.list(p1); p2=as.list(p2)
  f(p1$a,p1$b,p2$x,p2$y,p2$z)
}

data1 <- data.frame(a=1:10,b=11:20)
data2 <- data.frame(x=1:5,y=21:25,z=31:35)
indx  <- expand.grid(indx2=seq(nrow(data2)),indx1=seq(nrow(data1)))
result <- with(indx,f.new(data1[indx1,],data2[indx2,]))
sums   <- aggregate(result,by=list(rep(seq(nrow(data1)),each=nrow(data2))),sum)

You seem to want to evaluate a function for every combination of two sets of variables, the set of (a,b) and the set of (x, Sx, y, Sy, rho), then sum over the second set, for every instance of the first set.

So first this redefines the function f(...) to take two arguments, representing the two sets. This is f.new(...). You should probably define your original function that way - it will run faster.

Then we create a data frame, indx that has two columns, representing every combination of the row numbers in data1 and data2, then we call f.new(...) using data1 and data2 indexed using indx. This produced result which has the function evaluated at every combination of (a,b) and (x,y,z). Then we aggregate that to get the sums you specified.

This approach is memory intensive; result will have ~ 10MM elements, but will run faster than loops.



来源:https://stackoverflow.com/questions/27572050/how-to-avoid-multiple-loops-with-multiple-variables-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!