Perl regex matching optional phrase in longer sentence

前端 未结 3 1386
北海茫月
北海茫月 2021-01-01 05:28

I\'m trying to match an optional (possibly present) phrase in a sentence:

perl -e \'$_=\"word1 word2 word3\"; print \"1:$1 2:$2 3:$3\\n\" if m/(word1).*(word         


        
3条回答
  •  礼貌的吻别
    2021-01-01 06:18

    In order to solve your issue, you have to observe that the catch-all subexpression in your regex match material that you do not want them to:

     (word1).*(word2)?.*(word3)
            --
             ^--- this subexpression matches _all_ material between `word1` and `word3` in the test string, in particular `word2` if it is present
    
     (word1).*? (word2)? .*(word3)
            ---+--------+--
             ^       ^   ^-- this subexpression matches _all_ material between `word1` and `word3` in the test string, in particular `word2` if it is present
             |       |
             |       +------ this subexpression is empty, even if `word2` is present:
             |               - the preceding subexpression `.*?` matches minimally (ie. the empty string)
             |               - `(word2)?` cannot match for the preceding blank.
             |               - the following subexpression `.*` matches everything up to `word3`, including `word2`.
             |
             |               -> the pattern matches _as desired_ for test strings
             |                  where `word2` immediately follows `word1` without  
             |
             +-------------- this subexpression will always be empty
    

    What you need is a construction that prevents the catch-all to match strings that contain word2. Luckily, perl's regex syntax sports the negative lookbehind that serves the purpose: for each character in the match of the catch-all subexpression, make sure that it is not preceded by word2.

    In perl:

    /(word1).*(word2).*(word3)|word1((?

    Caveats

    1. This might be a performance hog.
    2. Note that word2 must be a literal, as the regex engine only supports patterns with match lengths known a priori.

    Alternative solution

    Given the Caveats you might try to alter the control logic:

    $teststring = $_;
    if ($teststring =~ m/(word1).*(word2).*(word3)/) {
        print \"1:$1 2:$2 3:$3\n\";
    }
    else {
        # You know by now that there is no word2 between any word1, word3 occurrences 
        if ($teststring =~ m/(word1).*(word3)/) {
            print \"1:$1 2:- 3:$2\n\";
        }
    }
    

提交回复
热议问题