问题
I'm messing around with the built-in dataset economics
in R, and I'm trying to pass a dataframe column as an argument in a function that uses piping (dplyr, %>%
). But I'm experiencing some seemingly strange problems. Somehow I can't successfully pass a column name as an argument to the function top_n() within my custom function. Here's how I would subset the 5 countries with the biggest population without a custom functon:
Code 1:
library(dplyr)
df_econ <- economics
df_top_5 <- df_econ %>% top_n(5, pop)
df_top_5
Output 1:
2014-12-01 12122.0 320201 5.0 12.6 8688
2015-01-01 12080.8 320367 5.5 13.4 8979
2015-02-01 12095.9 320534 5.7 13.1 8705
2015-03-01 12161.5 320707 5.2 12.2 8575
2015-04-01 12158.9 320887 5.6 11.7 8549
Wrapped into a custom function, it could look like this:
Code 2:
library(dplyr)
# data
data(economics)
df_econ <- economics
# custom function
fxtop <- function(df, number, column){
tops <- df %>% top_n(number, column)
return(tops)
}
# build a df using custom function
df_top_5 <- fxtop(df=df_econ, number=5, column='pop')
df_top_5
Output 2:
1967-07-01 507.4 198712 12.5 4.5 2944
1967-08-01 510.5 198911 12.5 4.7 2945
1967-09-01 516.3 199113 11.7 4.6 2958
1967-10-01 512.9 199311 12.5 4.9 3143
1967-11-01 518.1 199498 12.5 4.7 3066
1967-12-01 525.8 199657 12.1 4.8 3018
1968-01-01 531.5 199808 11.7 5.1 2878
1968-02-01 534.2 199920 12.2 4.5 3001
1968-03-01 544.9 200056 11.6 4.1 2877
1968-04-01 544.6 200208 12.2 4.6 2709
This output has 10 rows and not 5 as expected. I suspect that the argument number=5
is simply ignored and that the number that is actually used is defaulted to 10
. The data does not seem to be sorted by 'pop'
either.
What I've tried so far:
Attempt 1: hard-code pop
and number
within the custom function:
library(dplyr)
# data
data(economics)
df_econ <- economics
# custom function
fxtop <- function(df, number, column){
tops <- df %>% top_n(5, pop)
return(tops)
}
# build a df using custom function
df_top_5 <- fxtop(df=df_econ, number=5, column='pop')
df_top_5
Attempt 1: Output:
2014-12-01 12122.0 320201 5.0 12.6 8688
2015-01-01 12080.8 320367 5.5 13.4 8979
2015-02-01 12095.9 320534 5.7 13.1 8705
2015-03-01 12161.5 320707 5.2 12.2 8575
2015-04-01 12158.9 320887 5.6 11.7 8549
Attempt 1: Comment
This is the desired output!
Let's see what happens when I'm passing the variables through the function
Attempt 2: pass variables as object instead of string:
library(dplyr)
# data
data(economics)
df_econ <- economics
# custom function
fxtop <- function(df, number, column){
tops <- df %>% top_n(5, column)
return(tops)
}
# build a df using custom function
df_top_5 <- fxtop(df=df_econ, number=5, column='pop')
df_top_5
Attempt 2: Output:
Now the output is the same as in the first example. Both variables are seemingly ignored.
So, any suggestions?
回答1:
We can use non-standard evaluation with curly-curly ({{}}
)
library(dplyr)
library(rlang)
fxtop <- function(df, number, column){
tops <- df %>% top_n(number, {{column}})
return(tops)
}
and pass unquoted variable names
fxtop(df=df_econ, number=5, pop)
# date pce pop psavert uempmed unemploy
# <date> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2014-12-01 12062 319746. 7.6 12.9 8717
#2 2015-01-01 12046 319929. 7.7 13.2 8903
#3 2015-02-01 12082. 320075. 7.9 12.9 8610
#4 2015-03-01 12158. 320231. 7.4 12 8504
#5 2015-04-01 12194. 320402. 7.6 11.5 8526
If you want to pass column name as string (quoted), we can use sym
with !!
fxtop <- function(df, number, column){
tops <- df %>% top_n(number, !!sym(column))
return(tops)
}
fxtop(df=df_econ, number=5, 'pop')
来源:https://stackoverflow.com/questions/59767759/how-to-pass-a-dataframe-column-as-an-argument-in-a-function-using-piping