Delete rows containing specific words with additional conditions in R

问题

I would like to get rid of rows where the word "plan" is included in keyword unless "advertising" or "marketing" is also included. Specifically in the sample dataset, the rows with keyword containing "hr plan" and "operation plan" should be deleted.

keyword <- c("advertising plan",
               "advertising budget",
               "marketing plan",
               "marketing budget",
               "hr plan",
               "hr budget",
               "operation plan",
               "operation budget")
indicator <- c(1,0,0,1,1,1,0,1)
sample <- cbind(keyword,indicator)

回答1:

Without using fancy regular expressions, I'd probably just go for combining your two rules:

sample[!(grepl("plan", sample[,"keyword"]) &
        (!grepl("marketing|advertising", sample[,"keyword"]))),]
#     keyword              indicator
#[1,] "advertising plan"   "1"      
#[2,] "advertising budget" "0"      
#[3,] "marketing plan"     "0"      
#[4,] "marketing budget"   "1"      
#[5,] "hr budget"          "1"      
#[6,] "operation budget"   "1"

回答2:

Here is a possible solution using regex and the stringr package. As mentioned in the comments, i expanded the indicator for 2 more values. Basically you want to detect with the regular expression which keyword's don't have "plan" in them or start with either "advertising" or "marketing". hth

library("tidyverse")
library("stringr")

keyword <- c("advertising plan",
             "advertising budget",
             "marketing plan",
             "marketing budget",
             "hr plan",
             "hr budget",
             "operation plan",
             "operation budget")

indicator <- c(1,0,1,0,0,1,1,1)

df <- data_frame(keyword,indicator)

    df %>% 
  filter(!keyword %>% stringr::str_detect("plan") | 
           keyword %>% stringr::str_detect(pattern = c("^advertising|marketing")))

# A tibble: 6 × 2
             keyword indicator
               <chr>     <dbl>
1   advertising plan         1
2 advertising budget         0
3     marketing plan         1
4   marketing budget         0
5          hr budget         1
6   operation budget         1

来源：https://stackoverflow.com/questions/41623805/delete-rows-containing-specific-words-with-additional-conditions-in-r

标签

data-manipulation