Can we compute a sort of distance between regular expressions ?
The idea is to mesure in which way two regular expression are similar.
There's an answer hidden in an earlier question here on SO: Generating strings from regexes. You can calculate an (asymmetric) distance measure by generating strings using one regex and checking how many of those match the other regex.
This can be optimized by stripping out shared prefixes/suffixes. E.g. a[0-9]*
and a[0-7]*
share the a
prefix, so you can calculate the distance between [0-9]*
and [0-7]*
instead.