Regexp in C - match group

后端 未结 3 907
[愿得一人]
[愿得一人] 2021-01-15 04:25

I\'ve been struggling with regular expressions in C (just /usr/include/regex.h).


I have (let\'s say) hundreds of regexps and one of them can match

相关标签:
3条回答
  • 2021-01-15 04:39

    I assume your regex_match is some combination of regcomp and regexec. To enable grouping, you need to call regcomp with the REG_EXTENDED flag, but without the REG_NOSUB flag (in the third argument).

    regex_t compiled;
    regcomp(&compiled, "(match1)|(match2)|(match3)", REG_EXTENDED);
    

    Then allocate space for the groups. The number of groups is stored in compiled.re_nsub. Pass this number to regexec:

    size_t ngroups = compiled.re_nsub + 1;
    regmatch_t *groups = malloc(ngroups * sizeof(regmatch_t));
    regexec(&compiled, str, ngroups, groups, 0);
    

    Now, the first invalid group is the one with a -1 value in both its rm_so and rm_eo fields:

    size_t nmatched;
    for (nmatched = 0; nmatched < ngroups; nmatched++)
        if (groups[nmatched].rm_so == (size_t)(-1))
            break;
    

    nmatched is the number of parenthesized subexpressions (groups) matched. Add your own error checking.

    0 讨论(0)
  • 2021-01-15 04:43

    "I have (let's say) hundreds of regexps ..."

    It looks like you are trying to comparing the quad parts of ip addresses. In general, in using regular expressions, its usually a red flag when using that many regex's on a single target and stopping after a match.

    example: Which group will correctly match first?
    target ~'American' , pattern ~ /(Ame)|(Ameri)|(American)/
    This does not even include quantifiers in the subgroups.

    If its the case of a constant form the regex's are composed of/from, for instance data, it might be better to use C's string functions to split out the data from the form into an array, then compare the array items with the target. C is much faster for this than regex's.

    0 讨论(0)
  • 2021-01-15 04:46

    You could have them give you a array of strings that contain your regexps and test each one of them.

    //count is the number of regexps provided
    int give_me_number_of_regex_group(const char *needle,const char** regexps, int count ){
      for(int i = 0; i < count; ++i){
        if(regex_match(needle, regexp[i])){
          return i;
        }
      }
      return -1; //didn't match any
    }
    

    or am i overseeing something?

    0 讨论(0)
提交回复
热议问题