String manipulation vs Regexps

后端 未结 6 504
轮回少年
轮回少年 2021-01-18 13:45

We are often told that Regexps are slow and should be avoided whenever possible.

However, taking into account the overhead of doing some string manipulation oneself (

6条回答
  •  失恋的感觉
    2021-01-18 14:16

    Some regular expressions are extremely fast and the difference between the regex and a custom solution may be negligible (or not worth anyone's time to bother).

    The cases where regular expressions are slow, however, is when excessive backtracking occurs. Regular expressions parse from left to right and have the potential to match text in more than one way. So if they reach a point where the engine realizes that the pattern isn't going to match your test string, then it may start over and try to match in another way. This repeated backtracking adds up and slows down the algorithm.

    Often the regular expression can be rewritten to perform better. But the ultimate in performance would be to write your own optimized parser for the specific task. By writing your own parser you can for example parse from left to right while maintaining a memory (or state). If you use this technique in procedural code you can often achieve the effect you're looking for in one pass and without the slowness of backtracking.

    I was faced with this decision earlier this year. In fact the task at hand was on the outer fringe of what was even possible with regular expressions. In the end I decided to write my own parser, an embedded pushdown automaton, which is incredibly efficient for what I was trying to do. The task, by the way, was to build something that can parse regular expressions and provide Intellisense-like code hinting for them.

    It's somewhat ironic that I didn't use regular expressions to parse regular expressions, but you can read about the thought behind it all here... http://blog.regexhero.net/2010/03/code-hinting-for-regular-expressions.html

提交回复
热议问题