问题
I am trying to execute the function varipart() from the package ade4. I am trying to use the same number dataframe from each list in the different parts of the same function. I need to pass this for each set of dataframes.
########### DATA BELOW
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(2, 1, 2), y2 = c(5, 6, 4))
spec.list <- list(d1, d2, d3)
d1 <- data.frame(y1 = c(20, 87, 39), y2 = c(46, 51, 8))
d2 <- data.frame(y1 = c(30, 21, 12), y2 = c(61, 51, 33))
d3 <- data.frame(y1 = c(2, 11, 14), y2 = c(52, 16, 1))
env.list <- list(d1, d2, d3)
d1 <- data.frame(y1 = c(0.15, 0.1, 0.9), y2 = c(0.46, 0.51, 0.82))
d2 <- data.frame(y1 = c(0.13, 0.31, 0.9), y2 = c(0.11, 0.51, 0.38))
d3 <- data.frame(y1 = c(0.52, 0.11, 0.14), y2 = c(0.52, 0.36, 0.11))
spat.list <- list(d1, d2, d3)
###############
# I have tried two ways
library(parallel)
library(ade4)
output_varpart <- mclapply(spec.list, function(x){
varipart(x, env.list, spat.list, type = "parametric")
})
output_varpart <- mclapply(x, function(x){
varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
})
for(i in 1:length(x)){
results <- varipart(spec.list, env.list, spat.list, type = "parametric")
}
None of these methods work! Please be gentle, I'm new to list syntax and looping. Errors are "Warning message: In mclapply(output.spectrans.dudi, function(x) { : all scheduled cores encountered errors in user code" and "Error in x * w : non-numeric argument to binary operator", respectively.
回答1:
You were close, but I'll explain a bit how lapply
(and mclapply
) work, because it feels like you're mixing up what the role of x
is. First, this should work:
output_varpart <- mclapply(1:3, function(x){
varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
})
But why?
The function lapply
means: apply a function (2nd argument) to all values in a list (first argument). So lapply(list('Hello', 'World', '!'), print)
will do
print('Hello')
print('World')
print('!')
and it will return a list of length 3 with the results (the return of print
is the value that was printed)
But quite often, there is not one function that does exactly what you want. You can always define a function, like this:
my_vari_fun <- function(index) {
varipart(spec.list[[index]], env.list[[index]], spat.list[[index]], type = "parametric")
}
You can then call it like my_vari_fun(1)
, and it doesn't matter at all if the argument is called x
or index
, or something else. I'm sure you get it. So a next step would be
output_varpart <- lapply(list(1,2,3), my_vari_part)
The disadvantage of this is that it takes multiple lines of code, and we probably won't use my_vari_fun
again. So that's the reason we can provide an anonymous function, we just give a function to lapply without assigning it to a name. We just replace my_vari_fun
with it's "value" (which happens to be a function).
However, outside this function, x
doesn't mean anything. We could as well have called it any other name.
We just need to tell lapply what values to input: list(1,2,3)
. Or simpler as a vector, which lapply
will convert: 1:3
By the way, I've just inserted 3
here, but for the general case you can use 1:length(spec.list)
, you just have to make sure all lists are the same length.
Finally, I've talked about lapply
now, but it all works the same for mclapply
. The difference is only under the hood, mclapply
will spread its work over multiple cores.
Edit: debugging
In debugging, there is more difference between lapply
and mclapply
. I will first talk about lapply
.
If there is some error in your code that gets executed inside the lapply
, the entire lapply
will fail, and nothing gets assigned. Which sometimes makes it hard to spot exactly where an error takes place, but it can be done. A simple workaround may be feeding lapply
just parts of your input, to see where it breaks.
But R also comes with some debugging tools, where execution is freezes as soon as an error is encountered. I find recover
the most useful tool.
You can set it by options(error=recover)
, and every time an error is encountered, it gives you a backwards list of the function that threw the error, by which function it was called, by which function that was called, ...
Then you can choose a number to explore the environment in which that function was running. When I try to emulate your error, I get this:
Error in x * w : non-numeric argument to binary operator
Enter a frame number, or 0 to exit
1: source("~/.active-rstudio-document")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(ei, envir)
5: .active-rstudio-document#20: lapply(1:3, function(x) {
varipart(spec.list[[x]], env.list[[x]], spat.list[
6: FUN(X[[i]], ...)
7: .active-rstudio-document#21: varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
8: as.matrix(scalewt(Y, scale = scale))
9: scalewt(Y, scale = scale)
10: apply(df, 2, weighted.mean, w = wt)
11: FUN(newX[, i], ...)
12: weighted.mean.default(newX[, i], ...)
A lot of them are internal functions by R, and you can see what varipart
does: it passes on stuff to lower functions, who pass it on, etc.
For our purposes, we want number 6: here the lapply
calls your function, with the i-th
input value.
As soon as we enter 6
, we get a new prompt, that reads Browse[1]>
(in some cases it may be another number), and we are in the environment as if we just entered our
function(x){
varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
}
Which means typing x
will give you the value for which this function fails, and spec.list[[x]]
etc. will tell you for which inputs varipart
failed. Then the final step is deciding what this means: either varipart
is broken, or one of your inputs is.
In this case, I noticed I can get the same error by having one of the columns in the data.frame
something else then numeric
. But you'll have to look whether that is your problem as well, but debugging becomes a whole lot easier if you've figured out where the problem is.
With mclapply
mclapply
runs on multiple cores, which means that if there is an error in one core, the other cores still finish their jobs.
For calculations where a forked process encountered an error, that error will be the return value, in the form of a try-error
-object.
But note that that will be the case for other iterations by the same core as well. So if for mclapply(1:10, fun)
, fun(1)
will throw an error, in the case of 2 cores, all odd inputs will show that error.
So we can look at the return value, to narrow our search down:
sapply(output_varpart, class)
The error(s) is/are in the iterations where the output-class is try-error, but we can't know exactly which one.
How to practically solve it depends on the size of the calculations.
If they were really extensive, it may be worth it to keep the values that did succeed, and narrow it down again by re-running only the failed parts.
Or if I just see one try-error
, we don't need look any further.
But usually, I find it most useful to change the mclapply
to a regular lapply
, and use the approach above.
来源:https://stackoverflow.com/questions/53754263/looping-multiple-listed-data-frames-into-a-single-function