I have a big text:
\"Big piece of text. This sentence includes \'regexp\' word. And this
sentence doesn\'t include that word\"
I need to fi
Use lookahead asseterions.
When you want to check if a string does not contain another substring, you can write:
/^(?!.*substring)/
You must check also the beginning and the end of line for this
and word
:
/^this(?!.*substring).*word$/
Another problem here is that you don't work find strings, you want find sentences (if I understand your task right).
So the solution looks like this:
perl -e '
local $/;
$_=<>;
while($_ =~ /(.*?[.])/g) {
$s=$1;
print $s if $s =~ /^this(?!.*substring).*word[.]$/
};'
Example of usage:
$ cat 1.pl
local $/;
$_=<>;
while($_ =~ /(.*?[.])/g) {
$s=$1;
print $s if $s =~ /^\s*this(?!.*regexp).*word[.]/i;
};
$ cat 1.txt
This sentence has the "regexp" word. This sentence doesn't have the word. This sentence does have the "regexp" word again.
$ cat 1.txt | perl 1.pl
This sentence doesn't have the word.
With an ignore case option, the following should work:
\bthis\b(?:(?!\bregexp\b).)*?\bword\b
Example: http://www.rubular.com/r/g6tYcOy8IT
Explanation:
\bthis\b # match the word 'this', \b is for word boundaries
(?: # start group, repeated zero or more times, as few as possible
(?!\bregexp\b) # fail if 'regexp' can be matched (negative lookahead)
. # match any single character
)*? # end group
\bword\b # match 'word'
The \b
surrounding each word makes sure that you aren't matching on substrings, like matching the 'this' in 'thistle', or the 'word' in 'wordy'.
This works by checking at each character between your start word and your end word to make sure that the excluded word doesn't occur.