问题
How can I derive all possible matches of a regular expression
For example:
((a,b,c)o(m,v)p,b)
The strings generated from above expression would be:
aomp
bomp
comp
aovp
bovp
covp
b
回答1:
Your steps are pretty straight forward though implementing them may take a bit of work:
- Create a recursive function which extracts the string between the first set of parenthesis it comes to: https://stackoverflow.com/a/28863720/2642059
- In the function split this strings on
','
into avector<string>
and return it: https://stackoverflow.com/a/28880605/2642059 - Before returning test if it is necessary to recurse because of a nested parenthesis, one string must be added to the return for each possible combination returned from recursed functions
EDIT:
Say my input string was "(bl(ah,eck,le),yap)"
- The first function would extract the
string
: "bl(ah,eck,le),yap" - Before returning it would search for nested parenthesis, this would cause it to recurse:
- The second function would extract the
string
: "ah,eck,le" - Before returning it would search for nested parenthesis and find none
- It would return an
vector<string>
: ["ah","eck","le"]
- The second function would extract the
- The first function would now contain: "bl["ah","eck","le"],yap"
- It would not find anymore parenthesis to extract, so it would go to expanding all internal combinations: "["blah","bleck","blle"],yap"
- It could now split the string and return: ["blah","bleck","blle","yap"]
The return from your first function is your result.
EDIT:
Glad you solved it I wrote up a two state machine to solve it as well so I figured I could post it here for your comparison:
const char* extractParenthesis(const char* start, const char* finish){
int count = 0;
return find_if(start, finish, [&](char i){
if (i == '('){
count++;
}
else if (i == ')'){
count--;
}
return count <= 0; });
}
vector<string> split(const char* start, const char* finish){
const char delimiters[] = ",(";
const char* it;
vector<string> result;
do{
for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
it != finish && *it == '(';
it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
auto&& temp = interpolate(start, it);
result.insert(result.end(), temp.begin(), temp.end());
start = ++it;
} while (it <= finish);
return result;
}
vector<string> interpolate(const char* start, const char* finish){
vector<string> result{ 1, string{ start, find(start, finish, '(') } };
for (auto it = start + result[0].size();
it != finish;
it = find(++start, finish, '('),
for_each(result.begin(), result.end(), [&](string& i){ i += string{ start, it }; })){
start = extractParenthesis(it, finish);
auto temp = split(next(it), start);
const auto size = result.size();
result.resize(size * temp.size());
for (int i = result.size() - 1; i >= 0; --i){
result[i] = result[i % size] + temp[i / size];
}
}
return result;
}
Depending upon your compiler you'll need to forward declare these since they call each other. This will also crash fantastically if the input string is malformed. And it can't handle escaped control characters.
Anyway you can call it like this:
const char test[] = "((a,b,c)o(m,v)p,b)";
auto foo = interpolate(begin(test), end(test));
for (auto& i : foo){
cout << i << endl;
}
来源:https://stackoverflow.com/questions/28862347/generate-all-possible-matches-of-a-regular-expression