Distance between regular expression

后端 未结 6 1251
春和景丽
春和景丽 2021-02-14 14:46

Can we compute a sort of distance between regular expressions ?

The idea is to mesure in which way two regular expression are similar.

6条回答
  •  花落未央
    2021-02-14 14:59

    If you have two regular expressions and have a set of example inputs you could try matching every input against each regex. For each input:

    • If they both match or both don't match, score 0.
    • If one matches and the other doesn't, score 1.

    Sum this score over all inputs, and this will give you a 'distance' between the regular expressions. This will give you an idea of how often two regular expressions will differ for typical input. It will be very slow to calculate if your sample input set is large. It won't work at all if both regexes fail to match for almost all random strings and your expected input is entirely random. For example the regex 'sgjlkwren' and the regex 'ueuenwbkaalf' would probably both never match anything if tested on random input, so this metric would say the distance between them is zero. That might or might not be what you want (probably not).

    You might be able to analyze the structure of the regex and use biased random sampling to deliberately hit strings that match more frequently than in completely random input. For example, if both regex require that the string starts with 'foo', you could make sure that your test inputs also always start with foo, to avoid wasting time testing strings that you know will fail for both.

    So in conclusion: unless you have a very specific situation with a restricted input set and/or restricted regular expression language, I'd say its not possible. If you do have some restrictions on your input and on the regular expression, it might be possible. Please specify what these restrictions are and maybe I can come up with something better.

提交回复
热议问题