subset list of multiple dataframe based on either row or column match

夙愿已清 提交于 2020-03-23 15:39:12

问题


library(tidyverse)
library(dplyr)

list of data frames where some have matches to the vector and some don't

lsdf <- list(
  list1 = head(mtcars),
  list2 = as.data.frame(t(head(mtcars))) %>%
    rownames_to_column(., var = "ID"),
  list3 = head(starwars)
)

the vector of names that match to some dataframe

vec <- c("mpg", "wt", "am")

If I have to manually do it step by step this would be like this.

df1 <- lsdf$list1 %>% select(vec) # column-name matches
df2 <- lsdf$list2 %>% filter(ID %in% vec) # values in ID columns matches
df3 <- lsdf$list3 # No match found

out_lsdf <- list(
  df1 = df1,
  df2 = df2,
  df3 = df3
)

Is there a better and faster way to get the output desired below ?

out_lsdf
#> $df1
#>                    mpg    wt am
#> Mazda RX4         21.0 2.620  1
#> Mazda RX4 Wag     21.0 2.875  1
#> Datsun 710        22.8 2.320  1
#> Hornet 4 Drive    21.4 3.215  0
#> Hornet Sportabout 18.7 3.440  0
#> Valiant           18.1 3.460  0
#> 
#> $df2
#>    ID Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
#> 1 mpg     21.00        21.000      22.80         21.400             18.70
#> 2  wt      2.62         2.875       2.32          3.215              3.44
#> 3  am      1.00         1.000       1.00          0.000              0.00
#>   Valiant
#> 1   18.10
#> 2    3.46
#> 3    0.00
#> 
#> $df3
#> # A tibble: 6 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Luke~    172    77 blond      fair       blue            19   male   Tatooine 
#> 2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>   Tatooine 
#> 3 R2-D2     96    32 <NA>       white, bl~ red             33   <NA>   Naboo    
#> 4 Dart~    202   136 none       white      yellow          41.9 male   Tatooine 
#> 5 Leia~    150    49 brown      light      brown           19   female Alderaan 
#> 6 Owen~    178   120 brown, gr~ light      blue            52   male   Tatooine 
#> # ... with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>

Created on 2020-02-07 by the reprex package (v0.3.0)


回答1:


We can do this with a condition to check

library(purrr)
library(dplyr)
map(lsdf, ~ {nm1 <- names(.x)
      if("ID" %in% nm1)  .x %>%
                   filter(ID %in% vec) else if(any(nm1 %in% vec)) 
       .x %>% select(intersect(nm1, vec))
 else .x
 })

We could also call the if/else conditions within select/filter

map(lsdf, ~ .x %>% 
                select(if(any(names(.) %in% vec)) 
                   intersect(names(.), vec) else everything()) %>% 
                filter(if('ID' %in% names(.)) ID %in% vec else TRUE))


来源:https://stackoverflow.com/questions/60122187/subset-list-of-multiple-dataframe-based-on-either-row-or-column-match

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!