UB: C#'s Regex.Match returns whole string instead of part when matching

前端 未结 3 1604
小蘑菇
小蘑菇 2021-01-19 07:21

Attention! This is NOT related to Regex problem, matches the whole string instead of a part


Hi all. I try to do

Match y = Reg         


        
相关标签:
3条回答
  • 2021-01-19 08:01

    There is one capture group in the pattern; that is group 1.

    There is always group 0, which is the entire match.

    Therefore there are a total of 2 groups.

    0 讨论(0)
  • 2021-01-19 08:06

    I hit this problem when I first started using the .NET regex, too. The way to understand this is to understand that the Group member of Match is the nesting member. You have to traverse Groups in order to get down to lower captures. Groups also have Capture members. The Match is kind of like the top "Group" in that it represents the successful "match" of the whole string against your expression. The single input string can have multiple matches. The Captures member represents the match of your full expression.

    Whenever you have a single capture as you have, Group[1] will always be the data you are interested in. Look at this page. The source code in examples 2 and 3 is hardcoded to print out Groups[1].

    Remember that a single capture can capture multiple substrings in a single match operation. If this were the case then you would see Match.Groups[1].Captures.Count be greater than 1. Also, I think if you passed in multiple matching lines of text to the single Match call, then you would see Match.Captures.Count be greater than 1, but each top-level Match.Captures would be the full string matched by your full expression.

    0 讨论(0)
  • 2021-01-19 08:13

    My test regex was different from any others in the project's scope (thats what happens when Perl guy comes to C#), as it had no lookaheads/lookbehinds. So this discovery took some time.

    Now, why we should call Regex behaviour undocumented, not undefined:

    let's do some matches against "1.234567890".

    • PCRE-like syntax: (.)\.2345678
    • lookahead syntax: (.)(?=\.\d)

    When you're doing a normal match, the result is copied from whole matched part of line, no matter where you've put the parentesizes; in case of lookaheads present, anything that did not belongs to them is copied.

    So, the matches will return:

    • PCRE: 1.2345678 (at 2300, this looks like original string and I start yelling here at SO)
    • lookahead: 1
    0 讨论(0)
提交回复
热议问题