I am at the point with R where I would like to start writing my own functions because I tend to need to do the same things over and over. However, I am struggling to see how
At a glance, the biggest thing that you can do is to not use non-standard-evaluation shortcuts inside your functions: things like $
, subset()
and with()
. These are functions intended for convenient interactive use, not extensible programmatic use. (See, e.g., the Warning in ?subset
which should probably be added to ?with
, fortunes::fortune(312)
, fortunes::fortune(343)
.)
fortunes::fortune(312)
The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad. -- Greg Snow (in response to a user that wanted to access a column whose name is stored in
y
viax$y
rather thanx[[y]]
) R-help (February 2012)
fortunes::fortune(343)
Sooner or later most R beginners are bitten by this all too convenient shortcut. As an R newbie, think of R as your bank account: overuse of $-extraction can lead to undesirable consequences. It's best to acquire the
[[
and[
habit early. -- Peter Ehlers (about the use of $-extraction) R-help (March 2013)
When you start writing functions that work on data frames, if you need to reference column names you should pass them in as strings, and then use [
or [[
to get the column based on the string stored in a variable name. This is the simplest way to make functions flexible with user-specified column names. For example, here's a simple stupid function that tests if a data frame has a column of the given name:
does_col_exist_1 = function(df, col) {
return(!is.null(df$col))
}
does_col_exist_2 = function(df, col) {
return(!is.null(df[[col]])
# equivalent to df[, col]
}
These yield:
does_col_exist_1(mtcars, col = "jhfa")
# [1] FALSE
does_col_exist_1(mtcars, col = "mpg")
# [1] FALSE
does_col_exist_2(mtcars, col = "jhfa")
# [1] FALSE
does_col_exist_2(mtcars, col = "mpg")
# [1] TRUE
The first function is wrong because $
doesn't evaluate what comes after it, no matter what value I set col
to when I call the function, df$col
will look for a column literally named "col"
. The brackets, however, will evaluate col
and see "oh hey, col
is set to "mpg"
, let's look for a column of that name."
If you want lots more understanding of this issue, I'd recommend the Non-Standard Evaluation Section of Hadley Wickham's Advanced R book.
I'm not going to re-write and debug your functions, but if I wanted to my first step would be to remove all $
, with()
, and subset()
, replacing with [
. There's a pretty good chance that's all you need to do.