问题
I'm writing a free plugin for Google Docs and processing paragraphs of text.
I need a regular expression to match everything except a phrase (i.e. multiple words separated with spaces).
For example, when searching the text The quick brown fox jumped over the lazy dog
I want to match everything except quick brown
and lazy
with the expected result being The fox jumped over the dog
.
\b((?!(lazy)\b).)+
This works; it matches all text except lazy
and I get The quick brown fox jumped over the dog
.
\b((?!(quick brown|lazy)\b).)+
This does not work; it leaves in brown
and I get The brown fox jumped over the dog
when I should get The fox jumped over the dog
I've searched the web for hours on this and haven't had any luck. The regex is missing something and I don't know what it is.
Thanks for reading!
RegEx Example: https://regex101.com/r/3HGiff/1
Javascript Example: https://jsfiddle.net/g85je2aj/16/
EDIT/update: I developed another solution, but it relies on a positive lookbehind, which is only supported by Chrome.
((?<=(quick brown|lazy)+(?=[\s]))|^(?!(quick brown|lazy))).+?((?=(quick brown|lazy))|$)
RegEx Example: https://regex101.com/r/3HGiff/3
Javascript Example: https://jsfiddle.net/g85je2aj/19/
Since that only works in Chrome, I don't think it's a real solution. Any thoughts on how to modify that regex to not use a lookbehind, or is that impossible?
回答1:
Instead of matching all text that does not match some string(s), you may use a splitting approach. You may use a list of phrases you need to avoid getting to build an alternation based regex and use it with String#split()
:
var regExp = new RegExp("\\b(?:" + phrasesToSearchFor + ")\\b","i");
var results = textToSearchIn.split(regExp);
All you need to do later is access all the items in the results
array.
Here is the JS demo:
$(document).ready(function() {
$("#button").click(function () {
//the text to search for words in, then inverse highlight
var textToSearchIn = "The quick brown fox jumped over the lazy dog.";
//phrases to search for in a regex-friendly format
//please note: this string vary in length and number of phrases
// as it is parsed from an array of phrases using array.join('|');
var phrasesToSearchFor = "quick brown|lazy";
//build a new regular expression to match everything but the phrasesToSearchFor
//the best regex I have figured out is: \b((?!(quick brown|lazy)\b).)+
//but it only works for single-word phrases
var regExp = new RegExp("\\b(?:" + phrasesToSearchFor + ")\\b","i");
//do a while loop to collect all the matches
var results = textToSearchIn.split(regExp);
for (var result of results) {
//format the matche as a list item. we only need the first group [0]
var result = $('<li>' + result + '</li>');
//send the match to the html list
$('#output').before(result);
}
/* expected output:
* The
* fox jumped over the
* dog.
actual output:
* The
* brown fox jumped over the
* dog.
*/
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<button id="button">Click to test</button>
<ul id="output"></ul>
回答2:
Or you could use capturing groups instead:
(.*)(one|two words)\s(.*)
Then you could get your text without the specified words by using: $1$3
.
Example: regex101.com
来源:https://stackoverflow.com/questions/48456625/regex-match-everything-in-text-paragraph-except-specific-phrases