Regex - Capturing a Repeated Group

旧巷老猫 提交于 2020-01-21 10:10:31

问题


Alright, I've read the tutorials and scrambled my head too much to be able to see clearly now.

I'm trying to capture parameters and their type info from a function signature. So given a signature like this:

function(/*string*/a,b,c)

I want to get the parts like this:

type: string
param:a
param:b
param:c

This is Ok too:

type: string
param:a
type: null (or whitespace)
param:b
type: null (or whitespace)
param:c

So I came up with this regex which is doing the common mistake of repeating the capture (I've explicit capture turned on):

function\(((\/\*(?<type>[a-zA-Z]+)\*\/)?(?<param>[0-9a-zA-Z_$]+),?)*\)

Problem is, I can't correct the mistake. :(. Please help!


回答1:


Generally, you'd need two steps to get all data.
First, match/validate the whole function:

function\((?<parameters>((\/\*[a-zA-Z]+\*\/)?[0-9a-zA-Z_$]+,?)*)\)

Note that now you have a parameters group with all parameters. You can match some of the pattern again to get all matches of parameters, or in this case, split on ,.

If you're using .Net, by any chance, you're in luck. .Net keeps full record of all captures of each group, so you can use the collection:

match.Groups["param"].Captures

Some notes:

  • If you do want to capture more than one type, you definitely want empty matches, so you can easily combine the matches (though you can sort, but a 1-to-1 capture is neater). In that case, you want the optional group inside your captured group: (?<type>(\/\*[a-zA-Z]+\*\/)?)
  • You don't have to escape slashes in .Net patterns - / has no special meaning there (C#/.Net doesn't have regex delimiters).

Here's an example of using the captures. Again, the main point is maintaining the relation between type and param: you want to capture empty types, so you don't lose count.
Pattern:

function
\(
(?:
    (?:
        /\*(?<type>[a-zA-Z]+)\*/    # type within /* */
        |                           # or
        (?<type>)                   # capture an empty type.
    )
    (?<param>
        [0-9a-zA-Z_$]+
    )
    (?:,|(?=\s*\)))     # mandatory comma, unless before the last ')'
)*
\)

Code:

Match match = Regex.Match(s, pattern, RegexOptions.IgnorePatternWhitespace);
CaptureCollection types = match.Groups["type"].Captures;
CaptureCollection parameters = match.Groups["param"].Captures;
for (int i = 0; i < parameters.Count; i++)
{
    string parameter = parameters[i].Value;
    string type = types[i].Value;
    if (String.IsNullOrEmpty(type))
        type = "NO TYPE";
    Console.WriteLine("Parameter: {0}, Type: {1}", parameter, type);
}



回答2:


the page you referenced mentioned using ?: for non-capture, then surrounding the repeating capture in its own group. i am guessing they are suggesting something like this function\(((?:(\/\*(?<type>[a-zA-Z]+)\*\/)?(?<param>[0-9a-zA-Z_$]+),?)*)\)

i like to use http://gskinner.com/RegExr/ to test my expressions, but it won't show repeated captures. You may have to loop through the results in whatever return structure you get back to see the values in other non-.NET languages.

sorry i couldn't test more thuroughly...




回答3:


It's been a while since this question was active, but I think I finally found an answer.

I think I was looking for the same situation as you, but for use with PHP, and there is an answer in another post I found that works really well, using the \K and \G commands from PCRE. See Alan Moore's answer here: PHP Regular Expression - Repeating Match of a Group

My issue was trying to pull out all the cell values in a table, where each row contained a 6 digit number, 20x a 1 or 2 digit number, and an unrelated 1 or 2 digit number. The solution was:

<tr class="[^"]*">\s+<td>(\d{6})<\/td>|\G<\/td>[^<>]*+<td>\K\d{1,6}|<td>(\d{1,2})<\/td>

Very nice solution if I do say so myself!



来源:https://stackoverflow.com/questions/5982451/regex-capturing-a-repeated-group

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!