How can I find everything BUT certain phrases with a regular expression?

孤街浪徒 提交于 2019-12-19 04:32:11

问题


Ok, so I have a phrase "foo bar" and I want to find everything BUT "foo bar".
Here's my text.

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar

There's a way to do this just within regex right? I don't have to go and use strings etc. do I?

RESULT:

NOTE I can't do a nice highlighting but the bold gives you an idea (although the spaces that are before and after would also be selected but it breaks the bolding).

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar

Assume PCRE nomenclature.


UPDATE 7/29/2013: it may be better to use a search and replace function in your language of choice to just 'remove' the phrases that you don't want so that you are then left with the info you do want.


回答1:


try

^(?!.*foo bar).*$

this should select every line that does not contain "foo bar". (?! = negative lookahead)




回答2:


In general, if foobar matches itself, then (?s:(?!foobar).)* matches anything that is not foobar, including nothing at all.

You could use that to find lines that don’t have foobar in them, for example, using

^(?:(?!foobar).)*$

You could also use your language’s split() function to split on foobar, which will give you all the pieces that do not include the split pattern.

Regarding the nasty little-known backtracking control verbs like (*FAIL) and (*COMMIT), I haven’t yet had much occasion to use them in ‘non-toy’ programs. I find that independent subexpressions via (?>...) and the possessive quantifiers *+, ++, ?+ etc. give me more than enough rope, so to speak.

That said, I do have one toy example of using (*FAIL) in this answer; it’s the very first regex solution. The reason for its being there was I wanted to force the regex engine to backtrack through all possible permutations; the real goal was merely to count how many ways it tried things.

Please understand that my two regexes there, along with the many, many incredibly creative answers from others, are all meant to be fun, tongue-in-cheek things. Still, one can learn a lot from them — once one recovers from shock. ☺




回答3:


"remove everything except foo bar" is equivalent to "find only foo bar", which PCRE allows quite easily. Conversely, "find everything except foo bar" is equivalent to "find and remove only foo bar". So, complementation is easily done from your tools.

Aside from that, PCRE has a nasty little feature known as *FAIL which immediately causes a backtrack when it's encountered. So, I suppose inserting something like (*COMMIT)foo bar(*FAIL) into your regular expression could help. It's neither friendly nor very safe, though.




回答4:


Okay, so you want to remove everything except foo bar using UltraEdit's "Advanced" (Perl-regex style) search feature. The easiest way to do that is to match everything, but only capture foo bar, like this:

(?:(?!foo bar).)+(foo bar|$)

...and replace it with $1 or \1 (whichever style UltraEdit accepts).

I don't use UltraEdit, but in EditPadPro it converts this:

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar 

...to this:

foo bar

foo bar
foo bar

...which is the result you showed in your original message.




回答5:


Here: perl -pe 's{.*?(foo bar)?}{$1}g' <text

I want to find everything BUT "foo bar"

A match-only pattern without using substitution by $1 (that is usable with the empty replacement as in s{pattern}{})... not sure that is possible. You would have to gobble up chars up until foo bar, e.g. with .*?(?=foo bar). But then the matching algorithm continues on and sees "oo bar", and would match again as there is no f.

Continuing the quest, here is a piece of perl code that gobbles up the requested parts, only with the drawback that empty captures may be returned if foo bar happens to be at the start of the line:

foreach (<>) {
        chomp;
        @_ = m{(.*?)(?:foo bar|$)}gs;
        print "[[ $_ ]]\n" for @_;
}

There is no substituion involved and running this on the Lorem ipsum test file will show everything but the foo bar parts. It is PCRE compatible, but there is no guarantees that $EDITOR will does what you envision.




回答6:


to show everything except "foo bar" and "fad bad" this worked for me:

^(?!.*foo bar)(?!.*fad bad).*$



来源:https://stackoverflow.com/questions/4109147/how-can-i-find-everything-but-certain-phrases-with-a-regular-expression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!