R: specifying variable name in function parameter for a function of general (universal) use

后端 未结 3 634
慢半拍i
慢半拍i 2021-01-01 05:23

Here is my small function and data. Please note that I want to design a function not personal use for general use.

dataf <- data.frame (A= 1:10, B= 21:         


        
相关标签:
3条回答
  • 2021-01-01 05:42

    Let's investigate (see the comments I added) you original function and call, assuming you mean to pass the names of you columns of interest to the function:

    myfun <- function (dataframe, varA, varB) {
                  #on this next line, you use A and B. But this should be what is
                  #passed in as varA and varB, no?
                  daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
                  #so, as a correction, we need:
                  colnames(daf2)<-c(varA, varB)
                  #the first argument to lm is a formula. If you use it like this,
                  #it refers to columns with _names_ varA and varB, not as names
                  #the _contents_ of varA and varB!!
                  anv1 <- lm(varA ~ varB, daf2)
                  #so, what we really want, is to build a formula with the contents
                  #of varA and varB: we have to this by building up a character string:
                  frm<-paste(varA, varB, sep="~")
                  anv1 <- lm(formula(frm), daf2)
                  print(anova(anv1)) 
                 }             
    #here, you pass A and B, because you are used to being able to do that in a formula
    #(like in lm). But in a formula, there is a great deal of work done to make that
    #happen, that doesn't work for most of the rest of R, so you need to pass the names
    #again as character strings:
    myfun (dataframe = dataf, varA = A, varB = B)
    #becomes:
    myfun (dataframe = dataf, varA = "A", varB = "B")
    

    Note: in the above, I left the original code in place, so you may have to remove some of that to avoid the errors you were originally getting. The essence of your problems is that you should always pass column names as characters, and use them as such. This is one of the places where the syntactic sugar of formulas in R gets people into bad habits and misunderstandings...

    Now, as for an alternative: the only place the variable names are actually used, are in the formula. As such, you can simplify matters further if you don't mind some slight cosmetic differences in the results that you can clean up later: there is no need for you to pass along the column names!!

    myfun <- function (dataframe) {
                  daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
                  #now we know that columns A and B simply exist in data.frame daf2!!
                  anv1 <- lm(A ~ B, daf2)
                  print(anova(anv1))
                 }             
    

    As a final piece of advice: I would refrain from calling print on your last statement: if you don't, but use this method directly from the R command line, it will perform the print for you anyway. As an added advantage, you can perform further work with the object returned from your method.

    Cleaned Function with trial:

    dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)
    myfun <- function (dataframe, varA, varB) {
                   frm<-paste(varA, varB, sep="~")
                   anv1 <- lm(formula(frm), dataframe)
                   anova(anv1)
                 }
     myfun (dataframe = dataf, varA = "A", varB = "B")
      myfun (dataframe = dataf, varA = "A", varB = "D")
        myfun (dataframe = dataf, varA = "B", varB = "C")
    
    0 讨论(0)
  • 2021-01-01 05:42

    You could always go the (horrors) parse() route:

    Rgames: foo<- data.frame(one=1:5,two=6:10)
    Rgames: bar <- function(y) eval(parse(text=paste('foo$',y,sep='')))  
    

    Which is to say, inside your function, grab the arguments to the function and build up the internal data frame or pairs of vectors of data you want using the eval(parse(...)) setup.

    0 讨论(0)
  • 2021-01-01 05:59

    I'm not sure to fully understand your problem, so here is what i understood : you want your function to call the lm() function on data extracted from a data.frame given as an argument, and the columns in this data.frame specified by other arguments ?

    To me the simpliest solution is to mimic the lm() behavior and ask the user for a formula :

    dataf <- data.frame(A=1:10, B=21:30, C=51:60, D=71:80)
    
    myfun <- function(formula, dataframe) {
      daf2 <- data.frame(A=dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
      anv1 <- lm(formula=formula, data=daf2)
      print(anova(anv1))
    }
    
    myfun(formula=A~B, dataframe=dataf)
    

    An other solution is to build the formula yourself :

    dataf <- data.frame(A=1:10, B=21:30, C=51:60, D=71:80)
    
    myfun <- function(dataframe, varA, varB) {
      daf2 <- data.frame(A=dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
      frm = as.formula(sprintf("%s~%s", varA, varB))
      anv1 <- lm(frm, daf2)
      print(anova(anv1))
    }
    
    myfun(dataframe=dataf, varA="A", varB="B") 
    

    I am not so familiar with attach but i try to avoid it when possible, for masking problems as you mentionned. If you detach it at the end of the function i think it would not cause border effect, but you may raise a warning as well.

    0 讨论(0)
提交回复
热议问题