Why can regular expressions have an exponential running time?

前端 未结 2 511
闹比i
闹比i 2020-11-28 09:50

It is possible to write a Regex which needs in some cases exponential running time. Such an example is (aa|aa)*. If there is an input of an odd number of

相关标签:
2条回答
  • 2020-11-28 10:22

    Regular expressions conforming to this formal definition are computable in linear time, because they have corresponding finite automatas. They are built only from parentheses, alternative | (sometimes called sum), Kleene star * and concatenation.

    Extending regular expressions by adding, for example, backward references can lead even to NP-complete regular expressions. Here you can find an example of regular expression recognizing non-prime numbers.

    I guess that, such an extended implementation can have non-linear matching time even in simple cases.

    I made a quick experiment in Perl and your regular expression computes equally fast for odd and even number of 'a's.

    0 讨论(0)
  • 2020-11-28 10:26

    Russ Cox has a very detailed article about why this is and the history of regexes (part 2, part 3).

    Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow. With the exception of backreferences, the features provided by the slow backtracking implementations can be provided by the automata-based implementations at dramatically faster, more consistent speeds.

    Largely, it comes down to proliferation of non-regular features in "regular" expressions such as backreferences, and the (continued) ignorance of most programmers that there are better alternatives for regexes that do not contain such features (which is many of them).

    While writing the text editor sam in the early 1980s, Rob Pike wrote a new regular expression implementation, which Dave Presotto extracted into a library that appeared in the Eighth Edition. Pike's implementation incorporated submatch tracking into an efficient NFA simulation but, like the rest of the Eighth Edition source, was not widely distributed. Pike himself did not realize that his technique was anything new. Henry Spencer reimplemented the Eighth Edition library interface from scratch, but using backtracking, and released his implementation into the public domain. It became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on. (In his defense, Spencer knew the routines could be slow, and he didn't know that a more efficient algorithm existed. He even warned in the documentation, “Many users have found the speed perfectly adequate, although replacing the insides of egrep with this code would be a mistake.”) Pike's regular expression implementation, extended to support Unicode, was made freely available with sam in late 1992, but the particularly efficient regular expression search algorithm went unnoticed.

    0 讨论(0)
提交回复
热议问题