Generate All Possible Matches of a Regular Expression [closed]

北慕城南 提交于 2019-11-28 13:06:45

问题


How can I derive all possible matches of a regular expression

For example:

((a,b,c)o(m,v)p,b)

The strings generated from above expression would be:

aomp

bomp

comp

aovp

bovp

covp

b


回答1:


Your steps are pretty straight forward though implementing them may take a bit of work:

  1. Create a recursive function which extracts the string between the first set of parenthesis it comes to: https://stackoverflow.com/a/28863720/2642059
  2. In the function split this strings on ',' into a vector<string> and return it: https://stackoverflow.com/a/28880605/2642059
  3. Before returning test if it is necessary to recurse because of a nested parenthesis, one string must be added to the return for each possible combination returned from recursed functions

EDIT:

Say my input string was "(bl(ah,eck,le),yap)"

  • The first function would extract the string: "bl(ah,eck,le),yap"
  • Before returning it would search for nested parenthesis, this would cause it to recurse:
    • The second function would extract the string: "ah,eck,le"
    • Before returning it would search for nested parenthesis and find none
    • It would return an vector<string>: ["ah","eck","le"]
  • The first function would now contain: "bl["ah","eck","le"],yap"
  • It would not find anymore parenthesis to extract, so it would go to expanding all internal combinations: "["blah","bleck","blle"],yap"
  • It could now split the string and return: ["blah","bleck","blle","yap"]

The return from your first function is your result.

EDIT:

Glad you solved it I wrote up a two state machine to solve it as well so I figured I could post it here for your comparison:

const char* extractParenthesis(const char* start, const char* finish){
    int count = 0;

    return find_if(start, finish, [&](char i){
        if (i == '('){
            count++;
        }
        else if (i == ')'){
            count--;
        }
        return count <= 0; });
}

vector<string> split(const char* start, const char* finish){
    const char delimiters[] = ",(";
    const char* it;
    vector<string> result;

    do{
        for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
            it != finish && *it == '(';
            it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
        auto&& temp = interpolate(start, it);
        result.insert(result.end(), temp.begin(), temp.end());
        start = ++it;
    } while (it <= finish);
    return result;
}

vector<string> interpolate(const char* start, const char* finish){
    vector<string> result{ 1, string{ start, find(start, finish, '(') } };

    for (auto it = start + result[0].size();
        it != finish;
        it = find(++start, finish, '('),
        for_each(result.begin(), result.end(), [&](string& i){ i += string{ start, it }; })){
        start = extractParenthesis(it, finish);

        auto temp = split(next(it), start);
        const auto size = result.size();

        result.resize(size * temp.size());

        for (int i = result.size() - 1; i >= 0; --i){
            result[i] = result[i % size] + temp[i / size];
        }
    }
    return result;
}

Depending upon your compiler you'll need to forward declare these since they call each other. This will also crash fantastically if the input string is malformed. And it can't handle escaped control characters.

Anyway you can call it like this:

const char test[] = "((a,b,c)o(m,v)p,b)";
auto foo = interpolate(begin(test), end(test));

for (auto& i : foo){
    cout << i << endl;
}


来源:https://stackoverflow.com/questions/28862347/generate-all-possible-matches-of-a-regular-expression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!