Delete rows containing specific words with additional conditions in R

不羁岁月 提交于 2019-12-11 07:03:55

问题


I would like to get rid of rows where the word "plan" is included in keyword unless "advertising" or "marketing" is also included. Specifically in the sample dataset, the rows with keyword containing "hr plan" and "operation plan" should be deleted.

keyword <- c("advertising plan",
               "advertising budget",
               "marketing plan",
               "marketing budget",
               "hr plan",
               "hr budget",
               "operation plan",
               "operation budget")
indicator <- c(1,0,0,1,1,1,0,1)
sample <- cbind(keyword,indicator)

回答1:


Without using fancy regular expressions, I'd probably just go for combining your two rules:

sample[!(grepl("plan", sample[,"keyword"]) &
        (!grepl("marketing|advertising", sample[,"keyword"]))),]
#     keyword              indicator
#[1,] "advertising plan"   "1"      
#[2,] "advertising budget" "0"      
#[3,] "marketing plan"     "0"      
#[4,] "marketing budget"   "1"      
#[5,] "hr budget"          "1"      
#[6,] "operation budget"   "1" 



回答2:


Here is a possible solution using regex and the stringr package. As mentioned in the comments, i expanded the indicator for 2 more values. Basically you want to detect with the regular expression which keyword's don't have "plan" in them or start with either "advertising" or "marketing". hth

library("tidyverse")
library("stringr")

keyword <- c("advertising plan",
             "advertising budget",
             "marketing plan",
             "marketing budget",
             "hr plan",
             "hr budget",
             "operation plan",
             "operation budget")

indicator <- c(1,0,1,0,0,1,1,1)

df <- data_frame(keyword,indicator)

    df %>% 
  filter(!keyword %>% stringr::str_detect("plan") | 
           keyword %>% stringr::str_detect(pattern = c("^advertising|marketing")))

# A tibble: 6 × 2
             keyword indicator
               <chr>     <dbl>
1   advertising plan         1
2 advertising budget         0
3     marketing plan         1
4   marketing budget         0
5          hr budget         1
6   operation budget         1


来源:https://stackoverflow.com/questions/41623805/delete-rows-containing-specific-words-with-additional-conditions-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!