问题
I have the following code and would like to select columns into a new data.frame
.
library(dplyr)
df = data.frame(
Manhattan=c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0),
Brooklyn=c(0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0),
The_Bronx=c(1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0),
Staten_Island=c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0),
"2012"=c("P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q"),
"2013"=c("P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q"),
"2014"=c("P", "P", "P", "Q", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "P", "Q", "P", "P", "P", "Q", "Q"),
"2015"=c("P", "P", "P", "P", "P", "Q", "Q", "Q", "P", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), check.names=FALSE)
df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))
This throws the error:
Error in [.data.frame`(x, r, vars, drop = drop) :
undefined columns selected
Because the column "Queens" is missing from df
. How can I can override the error, so that R proceeds to create df2 with columns "Manhattan" and "The_Bronx" only?
Very important: My real data have hundreds of columns, so it is not doable to manually remove columns like "Queens" from the command df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))
(unless there is a trick for that?). Is there a way to solve this? Thank you.
回答1:
In base R, you can use intersect
to select only the names which are present.
cols <- c("Manhattan", "Queens", "The_Bronx")
subset(df, select = intersect(names(df), cols))
# Manhattan The_Bronx
#1 1 1
#2 1 1
#3 0 0
#4 1 0
#5 1 0
#6 1 0
#7 1 0
#8 0 0
#...
#....
Or use any_of
in dplyr
:
library(dplyr)
df %>% select(tidyselect::any_of(cols))
回答2:
We could also do
cols <- c("Manhattan", "Queens", "The_Bronx")
library(dplyr)
df %>%
select(matches(str_c(cols, collapse="|")))
回答3:
A tidyverse
implementation would be:
df2 <- select(df, any_of(cols))
来源:https://stackoverflow.com/questions/61152518/r-dplyr-subset-with-missing-columns