mapply with multiple arguments where one argument is constant (data)

妖精的绣舞 提交于 2021-01-28 08:54:15

问题


I'm struggling with using mapply on functions I construct where I have one or more arguments that are needed because I am programming in a bigger environment, for example if I write a function where one of the arguments are data.

fun_test <- function(data,col,val1,val2){return(data[col][1,] * val1-val2)}

So data and col can for example be constant, but I want to vary the output of my function depending on val1 and val2:

> mapply(FUN=fun_test,mtcars,"cyl",mtcars$cyl,mtcars$cyl*2)
Error in data[col][1, ] : incorrect number of dimensions

I'm trying to understand how mapply works; I surely cannot pass mtcars, and "cyl" as a vector, can I?

EDIT: I have an environment in which the data may vary, e.g. sometimes I use mtcars, sometimes it is another dataset. So I cannot hardcode the data into the function

EDIT2: 1) I have data some dataset, 2) I have different Excel-files that I read into R, 3) I make a lookup function that extracts information from these Excel-files in R, 4) for one or two variables (from the dataset) at the time I go into the lookup-functions I created and extract information.

So these lookup functions depend on both the data (the variables I need to lookup) and the Excel-files that I use to do the looking up.


回答1:


mapply is a multidimensional lapply. This means that instead of iterating over just one object (i.e. the columns of a data.frame or the elements of a vector), it iterates over multiple ones at the same time. The only condition is that the length of those objects needs to be the same, i.e. the columns of a data.frame and the lengths of the vectors. So, you cannot pass constants (unless you pass in a vector of the same constants to match the length, but why would you do that).

Try an easy example (sums the same indexes of the vectors):

mapply(sum, 1:10, 11:20)

So, in your case, just pass in the constants straight into the function:

fun_test <- function(val1, val2){return(mtcars['cyl'] * val1 - val2)}

mapply(FUN=fun_test, mtcars$cyl, mtcars$cyl*2)

Update:

Then I think what you need is to include mapply within your function. In that way you can add any argument you like (both constants and variable). It would look like this:

myfunc <- function(data, col, val1, val2) {

  fun_test <- function(val1, val2) {
    data[col] * val1 - val2 
  }

  mapply(FUN=fun_test, val1, val2)

}

myfunc(mtcars, 'cyl', mtcars$cyl, mtcars$cyl*2)



回答2:


If you want to pass dataframe as constant value pass it as list so that it is recycled completely otherwise it will pass each column separately in mapply

fun_test <- function(data,col,val1,val2){return(data[1, col] * val1-val2)}

mapply(FUN=fun_test, list(mtcars),"cyl",mtcars$cyl,mtcars$cyl*2)
#[1] 24 24 16 24 32 24 32 16 16 24 24 ......

So the first value 24 in the output can be reproduced by

mtcars[1, "cyl"] * mtcars$cyl[1] - mtcars$cyl[1]*2
#[1] 24

I know this is an example and actual implementation is different but you can get the same output directly by doing

mtcars[1, "cyl"] * mtcars$cyl - mtcars$cyl*2

To understand the difference between both the calls we can debug the function add browser() in the function

fun_test <- function(data,col,val1,val2){
   browser()
   return(data[1, col] * val1-val2)
}

Now, call the function and check the parameter in the function

mapply(FUN=fun_test, mtcars,"cyl",mtcars$cyl,mtcars$cyl*2)
Browse[1]> data
# [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 
#     10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 
#     15.8 19.7 15.0 21.4

this is first column in mtcars which is mpg (Check mtcars$mpg).

It is a numeric vector and now you are trying to subset mpg column and index 1 from it which gives you the same error

mtcars$mpg["cyl"][1, ]

Error in mtcars$mpg["cyl"][1, ] : incorrect number of dimensions

Now in 2nd case when we pass dataframe as list, check data

 mapply(FUN=fun_test, list(mtcars),"cyl",mtcars$cyl,mtcars$cyl*2)

Browse[1]> data
#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#....

It is complete dataframe and then you can subset from this

>data[1, "cyl"]
#[1] 6

PS - I don't know the context on why this being done and I believe there would be better ways to handle it.



来源:https://stackoverflow.com/questions/56443892/mapply-with-multiple-arguments-where-one-argument-is-constant-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!