Regular expression negative lookahead

China☆狼群 提交于 2019-11-26 03:55:38

问题


In my home directory I have a folder drupal-6.14 that contains the Drupal platform.

From this directory I use the following command:

find drupal-6.14 -type f -iname \'*\' | grep -P \'drupal-6.14/(?!sites(?!/all|/default)).*\' | xargs tar -czf drupal-6.14.tar.gz

What this command does is gzips the folder drupal-6.14, excluding all subfolders of drupal-6.14/sites/ except sites/all and sites/default, which it includes.

My question is on the regular expression:

grep -P \'drupal-6.14/(?!sites(?!/all|/default)).*\'

The expression works to exclude all the folders I want excluded, but I don\'t quite understand why.

It is a common task using regular expressions to

Match all strings, except those that don\'t contain subpattern x. Or in other words, negating a subpattern.

I (think) I understand that the general strategy to solve these problems is the use of negative lookaheads, but I\'ve never understood to a satisfactory level how positive and negative look(ahead/behind)s work.

Over the years, I\'ve read many websites on them. The PHP and Python regex manuals, other pages like http://www.regular-expressions.info/lookaround.html and so forth, but I\'ve never really had a solid understanding of them.

Could someone explain, how this is working, and perhaps provide some similar examples that would do similar things?

-- Update One:

Regarding Andomar\'s response: can a double negative lookahead be more succinctly expressed as a single positive lookahead statement:

i.e Is:

\'drupal-6.14/(?!sites(?!/all|/default)).*\'

equivalent to:

\'drupal-6.14/(?=sites(?:/all|/default)).*\'

???

-- Update Two:

As per @andomar and @alan moore - you can\'t interchange double negative lookahead for positive lookahead.


回答1:


A negative lookahead says, at this position, the following regex can not match.

Let's take a simplified example:

a(?!b(?!c))

a      Match: (?!b) succeeds
ac     Match: (?!b) succeeds
ab     No match: (?!b(?!c)) fails
abe    No match: (?!b(?!c)) fails
abc    Match: (?!b(?!c)) succeeds

The last example is a double negation: it allows a b followed by c. The nested negative lookahead becomes a positive lookahead: the c should be present.

In each example, only the a is matched. The lookahead is only a condition, and does not add to the matched text.




回答2:


Lookarounds can be nested.

So this regex matches "drupal-6.14/" that is not followed by "sites" that is not followed by "/all" or "/default".

Confusing? Using different words, we can say it matches "drupal-6.14/" that is not followed by "sites" unless that is further followed by "/all" or "/default"




回答3:


If you revise your regular expression like this:

drupal-6.14/(?=sites(?!/all|/default)).*
             ^^

...then it will match all inputs that contain drupal-6.14/ followed by sites followed by anything other than /all or /default. For example:

drupal-6.14/sites/foo
drupal-6.14/sites/bar
drupal-6.14/sitesfoo42
drupal-6.14/sitesall

Changing ?= to ?! to match your original regex simply negates those matches:

drupal-6.14/(?!sites(?!/all|/default)).*
             ^^

So, this simply means that drupal-6.14/ now cannot be followed by sites followed by anything other than /all or /default. So now, these inputs will satisfy the regex:

drupal-6.14/sites/all
drupal-6.14/sites/default
drupal-6.14/sites/all42

But, what may not be obvious from some of the other answers (and possibly your question) is that your regex will also permit other inputs where drupal-6.14/ is followed by anything other than sites as well. For example:

drupal-6.14/foo
drupal-6.14/xsites

Conclusion: So, your regex basically says to include all subdirectories of drupal-6.14 except those subdirectories of sites whose name begins with anything other than all or default.



来源:https://stackoverflow.com/questions/1749437/regular-expression-negative-lookahead

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!