问题
I have a helper function (say foo()
) that will be run on various data frames that may or may not contain specified variables. Suppose I have
library(dplyr)
d1 <- data_frame(taxon=1,model=2,z=3)
d2 <- data_frame(taxon=2,pss=4,z=3)
The variables I want to select are
vars <- intersect(names(data),c("taxon","model","z"))
that is, I'd like foo(d1)
to return the taxon
, model
, and z
columns, while foo(d2)
returns just taxon
and z
.
If foo
contains select(data,c(taxon,model,z))
then foo(d2)
fails (because d2
doesn't contain model
). If I use select(data,-pss)
then foo(d1)
fails similarly.
I know how to do this if I retreat from the tidyverse (just return data[vars]
), but I'm wondering if there's a handy way to do this either (1) with a select()
helper of some sort (tidyselect::select_helpers
) or (2) with tidyeval (which I still haven't found time to get my head around!)
回答1:
Another option is select_if
:
d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z'))
# # A tibble: 1 x 2
# taxon z
# <dbl> <dbl>
# 1 2 3
回答2:
You can use one_of()
, which gives a warning when the column is absent but otherwise selects the correct columns:
d1 %>%
select(one_of(c("taxon", "model", "z")))
d2 %>%
select(one_of(c("taxon", "model", "z")))
回答3:
Using the builtin anscombe
data frame for the example noting that z
is not a column in anscombe
:
anscombe %>% select(intersect(names(.), c("x1", "y1", "z")))
giving:
x1 y1
1 10 8.04
2 8 6.95
3 13 7.58
4 9 8.81
5 11 8.33
6 14 9.96
7 6 7.24
8 4 4.26
9 12 10.84
10 7 4.82
11 5 5.68
来源:https://stackoverflow.com/questions/51529294/dplyrselect-with-some-variables-that-may-not-exist-in-the-data-frame