Delete rows containing specific words with additional conditions in R

不羁岁月 提交于 2019-12-11 07:03:55


I would like to get rid of rows where the word "plan" is included in keyword unless "advertising" or "marketing" is also included. Specifically in the sample dataset, the rows with keyword containing "hr plan" and "operation plan" should be deleted.

keyword <- c("advertising plan",
               "advertising budget",
               "marketing plan",
               "marketing budget",
               "hr plan",
               "hr budget",
               "operation plan",
               "operation budget")
indicator <- c(1,0,0,1,1,1,0,1)
sample <- cbind(keyword,indicator)


Without using fancy regular expressions, I'd probably just go for combining your two rules:

sample[!(grepl("plan", sample[,"keyword"]) &
        (!grepl("marketing|advertising", sample[,"keyword"]))),]
#     keyword              indicator
#[1,] "advertising plan"   "1"      
#[2,] "advertising budget" "0"      
#[3,] "marketing plan"     "0"      
#[4,] "marketing budget"   "1"      
#[5,] "hr budget"          "1"      
#[6,] "operation budget"   "1" 


Here is a possible solution using regex and the stringr package. As mentioned in the comments, i expanded the indicator for 2 more values. Basically you want to detect with the regular expression which keyword's don't have "plan" in them or start with either "advertising" or "marketing". hth


keyword <- c("advertising plan",
             "advertising budget",
             "marketing plan",
             "marketing budget",
             "hr plan",
             "hr budget",
             "operation plan",
             "operation budget")

indicator <- c(1,0,1,0,0,1,1,1)

df <- data_frame(keyword,indicator)

    df %>% 
  filter(!keyword %>% stringr::str_detect("plan") | 
           keyword %>% stringr::str_detect(pattern = c("^advertising|marketing")))

# A tibble: 6 × 2
             keyword indicator
               <chr>     <dbl>
1   advertising plan         1
2 advertising budget         0
3     marketing plan         1
4   marketing budget         0
5          hr budget         1
6   operation budget         1

