Regex - Capturing a Repeated Group

后端 未结 3 1910
死守一世寂寞
死守一世寂寞 2021-01-15 20:49

Alright, I\'ve read the tutorials and scrambled my head too much to be able to see clearly now.

I\'m trying to capture parameters and their type info from a function

相关标签:
3条回答
  • 2021-01-15 21:01

    Generally, you'd need two steps to get all data.
    First, match/validate the whole function:

    function\((?<parameters>((\/\*[a-zA-Z]+\*\/)?[0-9a-zA-Z_$]+,?)*)\)
    

    Note that now you have a parameters group with all parameters. You can match some of the pattern again to get all matches of parameters, or in this case, split on ,.

    If you're using .Net, by any chance, you're in luck. .Net keeps full record of all captures of each group, so you can use the collection:

    match.Groups["param"].Captures
    

    Some notes:

    • If you do want to capture more than one type, you definitely want empty matches, so you can easily combine the matches (though you can sort, but a 1-to-1 capture is neater). In that case, you want the optional group inside your captured group: (?<type>(\/\*[a-zA-Z]+\*\/)?)
    • You don't have to escape slashes in .Net patterns - / has no special meaning there (C#/.Net doesn't have regex delimiters).

    Here's an example of using the captures. Again, the main point is maintaining the relation between type and param: you want to capture empty types, so you don't lose count.
    Pattern:

    function
    \(
    (?:
        (?:
            /\*(?<type>[a-zA-Z]+)\*/    # type within /* */
            |                           # or
            (?<type>)                   # capture an empty type.
        )
        (?<param>
            [0-9a-zA-Z_$]+
        )
        (?:,|(?=\s*\)))     # mandatory comma, unless before the last ')'
    )*
    \)
    

    Code:

    Match match = Regex.Match(s, pattern, RegexOptions.IgnorePatternWhitespace);
    CaptureCollection types = match.Groups["type"].Captures;
    CaptureCollection parameters = match.Groups["param"].Captures;
    for (int i = 0; i < parameters.Count; i++)
    {
        string parameter = parameters[i].Value;
        string type = types[i].Value;
        if (String.IsNullOrEmpty(type))
            type = "NO TYPE";
        Console.WriteLine("Parameter: {0}, Type: {1}", parameter, type);
    }
    
    0 讨论(0)
  • 2021-01-15 21:02

    It's been a while since this question was active, but I think I finally found an answer.

    I think I was looking for the same situation as you, but for use with PHP, and there is an answer in another post I found that works really well, using the \K and \G commands from PCRE. See Alan Moore's answer here: PHP Regular Expression - Repeating Match of a Group

    My issue was trying to pull out all the cell values in a table, where each row contained a 6 digit number, 20x a 1 or 2 digit number, and an unrelated 1 or 2 digit number. The solution was:

    <tr class="[^"]*">\s+<td>(\d{6})<\/td>|\G<\/td>[^<>]*+<td>\K\d{1,6}|<td>(\d{1,2})<\/td>
    

    Very nice solution if I do say so myself!

    0 讨论(0)
  • 2021-01-15 21:21

    the page you referenced mentioned using ?: for non-capture, then surrounding the repeating capture in its own group. i am guessing they are suggesting something like this function\(((?:(\/\*(?<type>[a-zA-Z]+)\*\/)?(?<param>[0-9a-zA-Z_$]+),?)*)\)

    i like to use http://gskinner.com/RegExr/ to test my expressions, but it won't show repeated captures. You may have to loop through the results in whatever return structure you get back to see the values in other non-.NET languages.

    sorry i couldn't test more thuroughly...

    0 讨论(0)
提交回复
热议问题