Expected outcome in group capture?

僤鯓⒐⒋嵵緔 提交于 2019-12-04 18:53:28

It is due to the first (.*) being too greedy and eat up as much as possible, while still allowing (\d+)(.*) to match the rest of the string.

Basically, the match goes like this. At the beginning, the first .* will gobble up the whole string:

This order was placed for QT3000! OK?
                                     ^

However, since we can't find a match for \d+ here, we backtrack:

This order was placed for QT3000! OK?
                                    ^
This order was placed for QT3000! OK?
                                   ^
...

This order was placed for QT3000! OK?
                               ^

At this position, \d+ can be matched, so we proceed:

This order was placed for QT3000! OK?
                                ^

and .* will match the rest of the string.

That's the explanation for the output you see.


You can fix this problem by making the first (.*) lazy:

(.*?)(\d+)(.*)

The search for match for (.*?) will begin with empty string, and as it backtracks, it will gradually increase the amount of characters it gobbles up:

This order was placed for QT3000! OK?
^
This order was placed for QT3000! OK?
 ^
...

This order was placed for QT3000! OK?
                            ^

At this point, \d+ can be matched, and .* can also be matched, which finishes the matching attempt and the output will be as you expected.

The .* is matching (and consuming) as much characters as it can before finding \\d+. When it gets to \\d+, only one number is enough for matching.

So, you need to make the .* lazy:

(.*?)(\\d+)(.*)

Well, if you want to go into the details, .* first matches the whole string, then backtracks one character at a time so that the regex can also match (\\d+)(.*) which comes later on. Once it has backtracked to the last character here:

This order was placed for QT300

The rest of the regex ((\\d+)(.*)) is satisfied so the matching ends.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!