Regex including what is supposed to be non-capturing group in result

前端 未结 3 1329
耶瑟儿~
耶瑟儿~ 2020-12-12 04:35

I have the following simple test where i\'m trying to get the Regex pattern such that it yanks the executable name without the \".exe\" suffix.
 
It appears my n

相关标签:
3条回答
  • 2020-12-12 04:58

    It would match the non capturing group but won't capture it, so if you want the non captured part you should access the capture group instead of the whole match

    you can access groups in

    var asmName = Regex.Match(testEcl, @"([^\\]+)(?:\.exe)", RegexOptions.IgnoreCase);
    asmName.Groups[1].Value
    

    the demo for the regex can be found here

    0 讨论(0)
  • 2020-12-12 05:11

    You're using a non-capturing group. The emphasis is on the word group here; the group does not capture the .exe, but the regex in general still does.

    You're probably wanting to use a positive lookahead, which just asserts that the string must meet a criteria for the match to be valid, though that criteria is not captured.

    In other words, you want (?=, not (?:, at the start of your group.

    The former is only if you are enumerating the Groups property of the Match object; in your case, you're just using the Value property, so there's no distinction between a normal group (\.exe) and a non-capturing group (?:\.exe).

    To see the distinction, consider this test program:

    static void Main(string[] args)
    {
        var positiveInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
        Test(positiveInput, @"[^\\]+(\.exe)");
        Test(positiveInput, @"[^\\]+(?:\.exe)");
        Test(positiveInput, @"[^\\]+(?=\.exe)");
    
        var negativeInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.dll\" /?";
        Test(negativeInput, @"[^\\]+(?=\.exe)");
    }
    
    static void Test(String input, String pattern)
    {
        Console.WriteLine($"Input: {input}");
        Console.WriteLine($"Regex pattern: {pattern}");
    
        var match = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
    
        if (match.Success)
        {
            Console.WriteLine("Matched: " + match.Value);
            for (int i = 0; i < match.Groups.Count; i++)
            {
                Console.WriteLine($"Groups[{i}]: {match.Groups[i]}");
            }
        }
        else
        {
            Console.WriteLine("No match.");
        }
        Console.WriteLine("---");
    }
    

    The output of this is:

    Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
    Regex pattern: [^\\]+(\.exe)
    Matched: MyApp.exe
    Groups[0]: MyApp.exe
    Groups[1]: .exe
    ---
    Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
    Regex pattern: [^\\]+(?:\.exe)
    Matched: MyApp.exe
    Groups[0]: MyApp.exe
    ---
    Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
    Regex pattern: [^\\]+(?=\.exe)
    Matched: MyApp
    Groups[0]: MyApp
    ---
    Input: "D:\src\repos\myprj\bin\Debug\MyApp.dll" /?
    Regex pattern: [^\\]+(?=\.exe)
    No match.
    ---
    

    The first regex (@"[^\\]+(\.exe)") has \.exe as just a normal group. When we enumerate the Groups property, we see that .exe is indeed a group captured in our input. (Note that the entire regex is itself a group, hence Groups[0] is equal to Value).

    The second regex (@"[^\\]+(?:\.exe)") is the one provided in your question. The only difference compared to the previous scenario is that the Groups property doesn't contain .exe as one of its entries.

    The third regex (@"[^\\]+(?=\.exe)") is the one I'm suggesting you use. Now, the .exe part of the input isn't captured by the regex at all, but a regex won't match a string unless it ends in .exe, as the fourth scenario illustrates.

    0 讨论(0)
  • 2020-12-12 05:16

    A (?:...) is a non-capturing group that matches and still consumes the text. It means the part of text this group matches is still added to the overall match value.

    In general, if you want to match something but not consume, you need to use lookarounds. So, if you need to match something that is followed with a specific string, use a positive lookahead, (?=...) construct:

    some_pattern(?=specific string) // if specific string comes immmediately after pattern
    some_pattern(?=.*specific string) // if specific string comes anywhere after pattern
    

    If you need to match but "exclude from match" some specific text before, use a positive lookbehind:

    (?<=specific string)some_pattern // if specific string comes immmediately before pattern
    (?<=specific string.*?)some_pattern // if specific string comes anywhere before pattern
    

    Note that .*? or .* - that is, patterns with *, +, ?, {2,} or even {1,3} quantifiers - in lookbehind patterns are not always supported by regex engines, however, C# .NET regex engine luckily supports them. They are also supported by Python PyPi regex module, Vim, JGSoft software and now by ECMAScript 2018 compliant JavaScript environments.

    In this case, you may capture what you need to get and just match the context without capturing:

    var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
    var asmName = string.Empty; 
    var m = Regex.Match(testEcl, @"([^\\]+)\.exe", RegexOptions.IgnoreCase);
    if (m.Success)
    {
        asmName = m.Groups[1].Value;
    }
    Console.WriteLine(asmName);
    

    See the C# demo

    Details

    • ([^\\]+) - Capturing group 1: one or more chars other than \
    • \. - a literal dot
    • exe - a literal exe substring.

    Since we are only interested in capturing group #1 contents, we grab m.Groups[1].Value, and not the whole m.Value (that contains .exe).

    0 讨论(0)
提交回复
热议问题