Learning to write functions in R

前端未结

关注

 1  1226

I am at the point with R where I would like to start writing my own functions because I tend to need to do the same things over and over. However, I am struggling to see how

相关标签:

1条回答

日久生厌

2020-12-20 01:01
At a glance, the biggest thing that you can do is to not use non-standard-evaluation shortcuts inside your functions: things like $, subset() and with(). These are functions intended for convenient interactive use, not extensible programmatic use. (See, e.g., the Warning in ?subset which should probably be added to ?with, fortunes::fortune(312), fortunes::fortune(343).)
```
fortunes::fortune(312)
```
The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad. -- Greg Snow (in response to a user that wanted to access a column whose name is stored in y via x$y rather than x[[y]]) R-help (February 2012)
```
fortunes::fortune(343)
```
Sooner or later most R beginners are bitten by this all too convenient shortcut. As an R newbie, think of R as your bank account: overuse of $-extraction can lead to undesirable consequences. It's best to acquire the [[ and [ habit early. -- Peter Ehlers (about the use of $-extraction) R-help (March 2013)

When you start writing functions that work on data frames, if you need to reference column names you should pass them in as strings, and then use [ or [[ to get the column based on the string stored in a variable name. This is the simplest way to make functions flexible with user-specified column names. For example, here's a simple stupid function that tests if a data frame has a column of the given name:
```
does_col_exist_1 = function(df, col) {
    return(!is.null(df$col))
}

does_col_exist_2 = function(df, col) {
    return(!is.null(df[[col]])
    # equivalent to df[, col]
}
```
These yield:
```
does_col_exist_1(mtcars, col = "jhfa")
# [1] FALSE
does_col_exist_1(mtcars, col = "mpg")
# [1] FALSE

does_col_exist_2(mtcars, col = "jhfa")
# [1] FALSE
does_col_exist_2(mtcars, col = "mpg")
# [1] TRUE
```
The first function is wrong because $ doesn't evaluate what comes after it, no matter what value I set col to when I call the function, df$col will look for a column literally named "col". The brackets, however, will evaluate col and see "oh hey, col is set to "mpg", let's look for a column of that name."

If you want lots more understanding of this issue, I'd recommend the Non-Standard Evaluation Section of Hadley Wickham's Advanced R book.

I'm not going to re-write and debug your functions, but if I wanted to my first step would be to remove all $, with(), and subset(), replacing with [. There's a pretty good chance that's all you need to do.
0 讨论(0)
发布评论:

提交评论
- 加载中...