Regex: Match everything in text paragraph except specific phrases

烂漫一生 提交于 2021-01-27 20:05:02

问题


I'm writing a free plugin for Google Docs and processing paragraphs of text.

I need a regular expression to match everything except a phrase (i.e. multiple words separated with spaces).

For example, when searching the text The quick brown fox jumped over the lazy dog I want to match everything except quick brown and lazy with the expected result being The fox jumped over the dog.

\b((?!(lazy)\b).)+
This works; it matches all text except lazy and I get The quick brown fox jumped over the dog.

\b((?!(quick brown|lazy)\b).)+
This does not work; it leaves in brown and I get The brown fox jumped over the dog when I should get The fox jumped over the dog

I've searched the web for hours on this and haven't had any luck. The regex is missing something and I don't know what it is.

Thanks for reading!

RegEx Example: https://regex101.com/r/3HGiff/1
Javascript Example: https://jsfiddle.net/g85je2aj/16/

EDIT/update: I developed another solution, but it relies on a positive lookbehind, which is only supported by Chrome.

((?<=(quick brown|lazy)+(?=[\s]))|^(?!(quick brown|lazy))).+?((?=(quick brown|lazy))|$)

RegEx Example: https://regex101.com/r/3HGiff/3
Javascript Example: https://jsfiddle.net/g85je2aj/19/

Since that only works in Chrome, I don't think it's a real solution. Any thoughts on how to modify that regex to not use a lookbehind, or is that impossible?


回答1:


Instead of matching all text that does not match some string(s), you may use a splitting approach. You may use a list of phrases you need to avoid getting to build an alternation based regex and use it with String#split():

var regExp = new RegExp("\\b(?:" + phrasesToSearchFor + ")\\b","i");
var results =  textToSearchIn.split(regExp);

All you need to do later is access all the items in the results array.

Here is the JS demo:

$(document).ready(function() {
  $("#button").click(function () {
  //the text to search for words in, then inverse highlight
  var textToSearchIn = "The quick brown fox jumped over the lazy dog.";
  //phrases to search for in a regex-friendly format
  //please note: this string vary in length and number of phrases 
  //  as it is parsed from an array of phrases using array.join('|');  
  var phrasesToSearchFor = "quick brown|lazy";
  //build a new regular expression to match everything but the phrasesToSearchFor
  //the best regex I have figured out is:  \b((?!(quick brown|lazy)\b).)+
  //but it only works for single-word phrases
  var regExp = new RegExp("\\b(?:" + phrasesToSearchFor + ")\\b","i");
  //do a while loop to collect all the matches
  var results =  textToSearchIn.split(regExp);
  for (var result of results) {
    //format the matche as a list item.  we only need the first group [0]
    var result = $('<li>' + result + '</li>');
    //send the match to the html list
    $('#output').before(result);
  }
  /* expected output:  
     * The 
     * fox jumped over the 
     * dog.
    actual output:    
     * The 
     * brown fox jumped over the 
     * dog.
  */
  });
 });
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<button id="button">Click to test</button>
<ul id="output"></ul>



回答2:


Or you could use capturing groups instead:

(.*)(one|two words)\s(.*)

Then you could get your text without the specified words by using: $1$3.

Example: regex101.com



来源:https://stackoverflow.com/questions/48456625/regex-match-everything-in-text-paragraph-except-specific-phrases

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!