How do you capture a group with regex?

前端 未结 2 924
臣服心动
臣服心动 2021-02-01 16:47

I\'m trying to extract a string from another using regex. I\'m using the POSIX regex functions (regcomp, regexec ...), and I fail at capturing a group ...

F

相关标签:
2条回答
  • 2021-02-01 17:20

    Here's a code example that demonstrates capturing multiple groups.

    You can see that group '0' is the whole match, and subsequent groups are the parts within parentheses.

    Note that this will only capture the first match in the source string. Here's a version that captures multiple groups in multiple matches.

    #include <stdio.h>
    #include <string.h>
    #include <regex.h>
    
    int main ()
    {
      char * source = "___ abc123def ___ ghi456 ___";
      char * regexString = "[a-z]*([0-9]+)([a-z]*)";
      size_t maxGroups = 3;
    
      regex_t regexCompiled;
      regmatch_t groupArray[maxGroups];
    
      if (regcomp(&regexCompiled, regexString, REG_EXTENDED))
        {
          printf("Could not compile regular expression.\n");
          return 1;
        };
    
      if (regexec(&regexCompiled, source, maxGroups, groupArray, 0) == 0)
        {
          unsigned int g = 0;
          for (g = 0; g < maxGroups; g++)
            {
              if (groupArray[g].rm_so == (size_t)-1)
                break;  // No more groups
    
              char sourceCopy[strlen(source) + 1];
              strcpy(sourceCopy, source);
              sourceCopy[groupArray[g].rm_eo] = 0;
              printf("Group %u: [%2u-%2u]: %s\n",
                     g, groupArray[g].rm_so, groupArray[g].rm_eo,
                     sourceCopy + groupArray[g].rm_so);
            }
        }
    
      regfree(&regexCompiled);
    
      return 0;
    }
    

    Output:

    Group 0: [ 4-13]: abc123def
    Group 1: [ 7-10]: 123
    Group 2: [10-13]: def
    
    0 讨论(0)
  • 2021-02-01 17:22

    The 0th element of the pmatch array of regmatch_t structs will contain the boundaries of the whole string matched, as you have noticed. In your example, you are interested in the regmatch_t at index 1, not at index 0, in order to get information about the string matches by the subexpression.

    If you need more help, try editing your question to include an actual small code sample so that people can more easily spot the problem.

    0 讨论(0)
提交回复
热议问题