Regular expressions: find string without substring

后端 未结 2 1643
渐次进展
渐次进展 2020-12-03 01:11

I have a big text:

\"Big piece of text. This sentence includes \'regexp\' word. And this
sentence doesn\'t include that word\"

I need to fi

相关标签:
2条回答
  • 2020-12-03 01:48

    Use lookahead asseterions.

    When you want to check if a string does not contain another substring, you can write:

    /^(?!.*substring)/
    

    You must check also the beginning and the end of line for this and word:

    /^this(?!.*substring).*word$/
    

    Another problem here is that you don't work find strings, you want find sentences (if I understand your task right).

    So the solution looks like this:

    perl -e '
      local $/;
      $_=<>;
      while($_ =~ /(.*?[.])/g) { 
        $s=$1;
        print $s if $s =~ /^this(?!.*substring).*word[.]$/
      };'
    

    Example of usage:

    $ cat 1.pl
    local $/;
    $_=<>;
    while($_ =~ /(.*?[.])/g) {
        $s=$1;
        print $s if $s =~ /^\s*this(?!.*regexp).*word[.]/i;
    };
    
    $ cat 1.txt
    This sentence has the "regexp" word. This sentence doesn't have the word. This sentence does have the "regexp" word again.
    
    $ cat 1.txt | perl 1.pl 
     This sentence doesn't have the word.
    
    0 讨论(0)
  • 2020-12-03 02:10

    With an ignore case option, the following should work:

    \bthis\b(?:(?!\bregexp\b).)*?\bword\b
    

    Example: http://www.rubular.com/r/g6tYcOy8IT

    Explanation:

    \bthis\b           # match the word 'this', \b is for word boundaries
    (?:                # start group, repeated zero or more times, as few as possible
       (?!\bregexp\b)    # fail if 'regexp' can be matched (negative lookahead)
       .                 # match any single character
    )*?                # end group
    \bword\b           # match 'word'
    

    The \b surrounding each word makes sure that you aren't matching on substrings, like matching the 'this' in 'thistle', or the 'word' in 'wordy'.

    This works by checking at each character between your start word and your end word to make sure that the excluded word doesn't occur.

    0 讨论(0)
提交回复
热议问题