So i have a regex pattern, and I want to generate all the text permutations that would be allowed from that pattern.
Example:
var pattern = \"^My (?:
If you restrict yourself to the subset of regular expressions that are anchored at both ends, and involve only literal text, single-character wildcards, and alternation, the matching strings should be pretty easy to enumerate. I'd probably rewrite the regex as a BNF grammar and use that to generate an exhaustive list of matching strings. For your example:
->
-> "My "
-> "" | "real" | "biological"
-> " name is Steve"
Start with the productions that have only terminal symbols on the RHS, and enumerate
all the possible values that the nonterminal on the LHS could take. Then work your
way up to the productions with nonterminals on the RHS. For concatenation of nonterminal symbols, form the Cartesian product of the sets represented by each RHS nonterminal.
For alternation, take the union of the sets represented by each option. Continue
until you've worked your way up to
, then you're done.
However, once you include the '*' or '+' operators, you have to contend with infinite numbers of matching strings. And if you also want to handle advanced features like backreferences...you're probably well on your way to something that's isomorphic to the Halting Problem!