Regular expression that both includes and excludes certain strings in R

前端 未结 3 1455
傲寒
傲寒 2020-12-10 14:22

I am trying to use R to parse through a number of entries. I have two requirements for the the entries I want back. I want all the entries that contain the word apple<

3条回答
  •  囚心锁ツ
    2020-12-10 14:57

    Using a regular expression, you could do the following.

    x <- c('I like apples', 'I really like apples', 
           'I like apples and oranges', 'I like oranges and apples',
           'I really like oranges and apples but oranges more')
    
    x[grepl('^((?!.*orange).)*apple.*$', x, perl=TRUE)]
    # [1] "I like apples"        "I really like apples"
    

    The regular expression looks ahead to see if there's no character except a line break and no substring orange and if so, then the dot . will match any character except a line break as it is wrapped in a group, and repeated (0 or more times). Next we look for apple and any character except a line break (0 or more times). Finally, the start and end of line anchors are in place to make sure the input is consumed.


    UPDATE: You could use the following if performance is an issue.

    x[grepl('^(?!.*orange).*$', x, perl=TRUE)]
    

提交回复
热议问题