问题
Sample of dataset:
diag01 <- as.factor(c("S7211","J47","J47","K729","M2445","Z509","Z488","R13","L893","N318","L0311","S510","A047","D649"))
diag02 <- as.factor(c("K590","D761","J961","T501","M8580","R268","T831","G8240","B9688","G550","E162","T8902","E86","I849"))
diag03 <- as.factor(c("F058","M0820","E877","E86","G712","R32","A408","E888","G8220","C794","T68","L0310","M1094","D469"))
diag04 <- as.factor(c("E86","C845","R790","I420","G4732","R600","L893","R509","T913","C795","M8412","G8212","L891","L0311"))
diag05 <- as.factor(c("R001","N289","E876","E871","H659","R4589","N508","B99","I209","C773","T921","Q070","H919","L033"))
diag06 <- as.factor(c("I951","E877","S7240","I500","H901","E119","Z223","K590","I959","C509","G819","F719","Z290","R13"))
df <- data.frame(diag01, diag02, diag03, diag04, diag05, diag06)
I want to filter the entire rows that have a partial string match anywhere in a given list of columns (e.g. diag01, diag02, ...). I can achieve this on a single column e.g.
junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", diag02))
but I need to apply this to multiple columns (the original dataset has 216 columns and >1,000,000 rows). Among other options, I have tried
junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", df[,c(1:6)]))
junk <- apply(df, 1, function(r) any(r %in% grepl(pattern="^E11|^E16|^E86|^E87|^E88")))
I need the entire row and ideally I would like the filtering criteria to be restricted to a given list of columns as it is likely values in other columns may begin with the declared partial strings.
Made a genuine effort to search for a solution but obviously my knowledge of R is lacking.
回答1:
Perhaps we need
df %>%
filter_all(any_vars(grepl(pattern="^(E11|E16|E86|E87|E88)", .)))
Or with purrr
and dplyr
library(dplyr)
library(purrr)
df %>%
map(~grepl(pattern="^E11|^E16|^E86|^E87|^E88", .)) %>%
reduce(`|`) %>%
df[.,]
来源:https://stackoverflow.com/questions/46215672/r-filter-rows-based-on-multiple-partial-strings-applied-to-multiple-columns