Regex PHP. Reduce steps: limited by fixed width Lookbehind

天涯浪子 提交于 2019-12-06 04:44:24

A rule of thumb:

Do not let engine make an attempt on matching each single one character if there are some boundaries.

The quote originally comes from this answer. Following regular expression reduces steps in a significant manner because of the left side of the outermost alternation, from ~20000 to ~900:

(?:[^@^]++|[@^]{2,}+)(*SKIP)(*F)
|
(?<=([HUGE-CHARACTER-CLASS])|\^[cjleqrd])
    (\^[34biu78])*+@([a-z\d][\w-.]{0,25}[a-z\d])(\^[34biu78])*+(?=(?1))

Actually I don't care much about the number of steps being reported by regex101 because that wouldn't be true within your own environment and it is not obvious if some steps are real or not or what steps are missed. But in this case since the logic of regex is clear and the difference is a lot it makes sense.

What is the logic?

We first try to match what probably is not desired at all, throw it away and look for parts that may match our pattern. [^@^]++ matches up to a @ or ^ symbols (desired characters) and [@^]{2,}+ prevents engine to take extra steps before finding out it's going nowhere. So we make it to fail as soon as possible.

You can use i flag instead of defining uppercase forms of letters (this may have a little impact however).

See live demo here

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!