So i have a regex pattern, and I want to generate all the text permutations that would be allowed from that pattern.
Example:
var pattern = \"^My (?:
One method that might be a bit weird would be to put the possible choices into an array first, and then generate the regex based on the array and then use the same array to generate the permutations.
Here's a sketch of a function I wrote to take a List of Strings and return a list of all the permutated possibilities: (taking on char from each)
public static List<string> Calculate(List<string> strings) {
List<string> returnValue = new List<string>();
int[] numbers = new int[strings.Count];
for (int x = 0; x < strings.Count; x++) {
numbers[x] = 0;
}
while (true) {
StringBuilder value = new StringBuilder();
for (int x = 0; x < strings.Count; x++) {
value.Append(strings[x][numbers[x]]);
//int absd = numbers[x];
}
returnValue.Add(value.ToString());
numbers[0]++;
for (int x = 0; x < strings.Count-1; x++) {
if (numbers[x] == strings[x].Length) {
numbers[x] = 0;
numbers[x + 1] += 1;
}
}
if (numbers[strings.Count-1] == strings[strings.Count-1].Length)
break;
}
return returnValue;
}
If you restrict yourself to the subset of regular expressions that are anchored at both ends, and involve only literal text, single-character wildcards, and alternation, the matching strings should be pretty easy to enumerate. I'd probably rewrite the regex as a BNF grammar and use that to generate an exhaustive list of matching strings. For your example:
<lang> -> <begin> <middle> <end>
<begin> -> "My "
<middle> -> "" | "real" | "biological"
<end> -> " name is Steve"
Start with the productions that have only terminal symbols on the RHS, and enumerate
all the possible values that the nonterminal on the LHS could take. Then work your
way up to the productions with nonterminals on the RHS. For concatenation of nonterminal symbols, form the Cartesian product of the sets represented by each RHS nonterminal.
For alternation, take the union of the sets represented by each option. Continue
until you've worked your way up to <lang>
, then you're done.
However, once you include the '*' or '+' operators, you have to contend with infinite numbers of matching strings. And if you also want to handle advanced features like backreferences...you're probably well on your way to something that's isomorphic to the Halting Problem!