I\'m trying to match an optional (possibly present) phrase in a sentence:
perl -e \'$_=\"word1 word2 word3\"; print \"1:$1 2:$2 3:$3\\n\" if m/(word1).*(word
In order to solve your issue, you have to observe that the catch-all subexpression in your regex match material that you do not want them to:
(word1).*(word2)?.*(word3)
--
^--- this subexpression matches _all_ material between `word1` and `word3` in the test string, in particular `word2` if it is present
(word1).*? (word2)? .*(word3)
---+--------+--
^ ^ ^-- this subexpression matches _all_ material between `word1` and `word3` in the test string, in particular `word2` if it is present
| |
| +------ this subexpression is empty, even if `word2` is present:
| - the preceding subexpression `.*?` matches minimally (ie. the empty string)
| - `(word2)?` cannot match for the preceding blank.
| - the following subexpression `.*` matches everything up to `word3`, including `word2`.
|
| -> the pattern matches _as desired_ for test strings
| where `word2` immediately follows `word1` without
|
+-------------- this subexpression will always be empty
What you need is a construction that prevents the catch-all to match strings that contain word2
. Luckily, perl's regex syntax sports the negative lookbehind that serves the purpose: for each character in the match of the catch-all subexpression, make sure that it is not preceded by word2
.
In perl:
/(word1).*(word2).*(word3)|word1((?
Caveats
word2
must be a literal, as the regex engine only supports patterns with match lengths known a priori.Alternative solution
Given the Caveats you might try to alter the control logic:
$teststring = $_;
if ($teststring =~ m/(word1).*(word2).*(word3)/) {
print \"1:$1 2:$2 3:$3\n\";
}
else {
# You know by now that there is no word2 between any word1, word3 occurrences
if ($teststring =~ m/(word1).*(word3)/) {
print \"1:$1 2:- 3:$2\n\";
}
}