regex match substring unless another substring matches

前端 未结 2 2000
忘掉有多难
忘掉有多难 2021-01-18 23:19

I\'m trying to dig deeper into regexes and want to match a condition unless some substring is also found in the same string. I know I can use two grepl statem

相关标签:
2条回答
  • 2021-01-18 23:49

    You can use the anchored look-ahead solution (requiring Perl-style regexp):

    grepl("^(?!.*park)(?=.*dog.*man|.*man.*dog)", x, ignore.case=TRUE, perl=T)
    

    Here is an IDEONE demo

    • ^ - anchors the pattern at the start of the string
    • (?!.*park) - fail the match if park is present
    • (?=.*dog.*man|.*man.*dog) - fail the match if man and dog are absent.

    Another version (more scalable) with 3 look-aheads:

    ^(?!.*park)(?=.*dog)(?=.*man)
    
    0 讨论(0)
  • 2021-01-19 00:06

    stribizhev has already answered this question as it should be approached: with a negative lookahead.

    I'll contribute to this particular question:

    What is wrong with my understanding of (*SKIP)(*FAIL)?

    (*SKIP) and (*FAIL) are regex control verbs.

    1. (*FAIL) or (*F)
      This is the easiest to understand. (*FAIL) is exactly the same as a negative lookahead with an empty subpattern: (?!). As soon as the regex engine gets to that verb in the pattern it forces an immediate backtrack.
    2. (*SKIP) When the regex engine first encounters this verb, nothing happens, because it only acts when it's reached on backtracking. But if there is a later failure, and it reaches (*SKIP) from right to left, the backtracking can't pass (*SKIP). It causes:

      • A match failure.
      • The next match won't be attempted from the next character. Instead, it will start from the position in the text where the engine was when it reached (*SKIP).

      That is why these two control verbs are usually together as (*SKIP)(*FAIL)

    Let's consider the following example:

    • Pattern: .*park(*SKIP)(*FAIL)|.*dog
    • Subject: "That park has too many dogs"
    • Matches: " has too many dog"

    Internals:

    1. First attempt.
        That park has too many dogs              ||  .*park(*SKIP)(*FAIL)|.*dog
                /\                                        /\
              (here) we have a match for park
                     the engine passes (*SKIP) -no action
                     it then encounters (*FAIL) -backtrack
                     Now it reaches (*SKIP) from the right -FAIL!
    
    1. Second attempt.
      Normally, it should start from the second character in the subject. However, (*SKIP) has this particular behaviour. The 2nd attempt starts:
        That park has too many dogs              ||  .*park(*SKIP)(*FAIL)|.*dog
                /\                                                       /\
              (here)
              Now, there's no match for .*park
              And off course it matches .*dog
    
        That park has too many dogs              ||  .*park(*SKIP)(*FAIL)|.*dog
                 ^               ^                                        -----
                 |    (MATCH!)   |
                 +---------------+
    

    DEMO


    How can I match the logic of find "dog" & "man" but not "park" with 1 regex?

    Use stribizhev's solution!! Try to avoid using control verbs for the sake of compatibility, they're not implemented in all regex flavours. But if you're interested in these regex oddities, there's another stronger control verb: (*COMMIT). It is similar to (*SKIP), acting only while on backtracking, except it causes the entire match to fail (there won't be any other attempt at all). For example:

    +-----------------------------------------------+
    |Pattern:                                       |
    |^.*park(*COMMIT)(*FAIL)|dog                    |
    +-------------------------------------+---------+
    |Subject                              | Matches |
    +-----------------------------------------------+
    |The dog and the man play in the park.|  FALSE  |
    |Man I love that dog!                 |  TRUE   |
    |I'm dog tired                        |  TRUE   |
    |The dog park is no place for man.    |  FALSE  |
    |park next to this dog's man.         |  FALSE  |
    +-------------------------------------+---------+
    

    IDEONE demo

    0 讨论(0)
提交回复
热议问题