R dplyr subset with missing columns

若如初见. 提交于 2020-05-28 08:28:02

问题


I have the following code and would like to select columns into a new data.frame.

library(dplyr)
df = data.frame(
    Manhattan=c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0), 
    Brooklyn=c(0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0), 
    The_Bronx=c(1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0), 
    Staten_Island=c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), 
    "2012"=c("P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), 
    "2013"=c("P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q"), 
    "2014"=c("P", "P", "P", "Q", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "P", "Q", "P", "P", "P", "Q", "Q"), 
    "2015"=c("P", "P", "P", "P", "P", "Q", "Q", "Q", "P", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), check.names=FALSE)
df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))

This throws the error:

Error in [.data.frame`(x, r, vars, drop = drop) : 
   undefined columns selected

Because the column "Queens" is missing from df. How can I can override the error, so that R proceeds to create df2 with columns "Manhattan" and "The_Bronx" only?

Very important: My real data have hundreds of columns, so it is not doable to manually remove columns like "Queens" from the command df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx")) (unless there is a trick for that?). Is there a way to solve this? Thank you.


回答1:


In base R, you can use intersect to select only the names which are present.

cols <- c("Manhattan", "Queens", "The_Bronx")
subset(df, select = intersect(names(df), cols))

#   Manhattan The_Bronx
#1          1         1
#2          1         1
#3          0         0
#4          1         0
#5          1         0
#6          1         0
#7          1         0
#8          0         0
#...
#....

Or use any_of in dplyr :

library(dplyr)
df %>% select(tidyselect::any_of(cols))



回答2:


We could also do

cols <- c("Manhattan", "Queens", "The_Bronx")
library(dplyr)
df %>%
   select(matches(str_c(cols, collapse="|")))



回答3:


A tidyverse implementation would be:

df2 <- select(df, any_of(cols))



来源:https://stackoverflow.com/questions/61152518/r-dplyr-subset-with-missing-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!