问题
I would like to get rid of rows where the word "plan" is included in keyword unless "advertising" or "marketing" is also included. Specifically in the sample dataset, the rows with keyword containing "hr plan" and "operation plan" should be deleted.
keyword <- c("advertising plan",
"advertising budget",
"marketing plan",
"marketing budget",
"hr plan",
"hr budget",
"operation plan",
"operation budget")
indicator <- c(1,0,0,1,1,1,0,1)
sample <- cbind(keyword,indicator)
回答1:
Without using fancy regular expressions, I'd probably just go for combining your two rules:
sample[!(grepl("plan", sample[,"keyword"]) &
(!grepl("marketing|advertising", sample[,"keyword"]))),]
# keyword indicator
#[1,] "advertising plan" "1"
#[2,] "advertising budget" "0"
#[3,] "marketing plan" "0"
#[4,] "marketing budget" "1"
#[5,] "hr budget" "1"
#[6,] "operation budget" "1"
回答2:
Here is a possible solution using regex and the stringr
package. As mentioned in the comments, i expanded the indicator
for 2 more values. Basically you want to detect with the regular expression which keyword
's don't have "plan" in them or start with either "advertising" or "marketing". hth
library("tidyverse")
library("stringr")
keyword <- c("advertising plan",
"advertising budget",
"marketing plan",
"marketing budget",
"hr plan",
"hr budget",
"operation plan",
"operation budget")
indicator <- c(1,0,1,0,0,1,1,1)
df <- data_frame(keyword,indicator)
df %>%
filter(!keyword %>% stringr::str_detect("plan") |
keyword %>% stringr::str_detect(pattern = c("^advertising|marketing")))
# A tibble: 6 × 2
keyword indicator
<chr> <dbl>
1 advertising plan 1
2 advertising budget 0
3 marketing plan 1
4 marketing budget 0
5 hr budget 1
6 operation budget 1
来源:https://stackoverflow.com/questions/41623805/delete-rows-containing-specific-words-with-additional-conditions-in-r