Learning to write functions in R

前端 未结 1 1226
抹茶落季
抹茶落季 2020-12-20 00:18

I am at the point with R where I would like to start writing my own functions because I tend to need to do the same things over and over. However, I am struggling to see how

相关标签:
1条回答
  • 2020-12-20 01:01

    At a glance, the biggest thing that you can do is to not use non-standard-evaluation shortcuts inside your functions: things like $, subset() and with(). These are functions intended for convenient interactive use, not extensible programmatic use. (See, e.g., the Warning in ?subset which should probably be added to ?with, fortunes::fortune(312), fortunes::fortune(343).)

    fortunes::fortune(312)
    

    The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad. -- Greg Snow (in response to a user that wanted to access a column whose name is stored in y via x$y rather than x[[y]]) R-help (February 2012)

    fortunes::fortune(343)
    

    Sooner or later most R beginners are bitten by this all too convenient shortcut. As an R newbie, think of R as your bank account: overuse of $-extraction can lead to undesirable consequences. It's best to acquire the [[ and [ habit early. -- Peter Ehlers (about the use of $-extraction) R-help (March 2013)

    When you start writing functions that work on data frames, if you need to reference column names you should pass them in as strings, and then use [ or [[ to get the column based on the string stored in a variable name. This is the simplest way to make functions flexible with user-specified column names. For example, here's a simple stupid function that tests if a data frame has a column of the given name:

    does_col_exist_1 = function(df, col) {
        return(!is.null(df$col))
    }
    
    does_col_exist_2 = function(df, col) {
        return(!is.null(df[[col]])
        # equivalent to df[, col]
    }
    

    These yield:

    does_col_exist_1(mtcars, col = "jhfa")
    # [1] FALSE
    does_col_exist_1(mtcars, col = "mpg")
    # [1] FALSE
    
    does_col_exist_2(mtcars, col = "jhfa")
    # [1] FALSE
    does_col_exist_2(mtcars, col = "mpg")
    # [1] TRUE
    

    The first function is wrong because $ doesn't evaluate what comes after it, no matter what value I set col to when I call the function, df$col will look for a column literally named "col". The brackets, however, will evaluate col and see "oh hey, col is set to "mpg", let's look for a column of that name."

    If you want lots more understanding of this issue, I'd recommend the Non-Standard Evaluation Section of Hadley Wickham's Advanced R book.

    I'm not going to re-write and debug your functions, but if I wanted to my first step would be to remove all $, with(), and subset(), replacing with [. There's a pretty good chance that's all you need to do.

    0 讨论(0)
提交回复
热议问题