Say I\'m running a service where users can submit a regex to search through lots of data. If the user submits a regex that is very slow (ie. takes minutes for Matcher.find()
What about checking the user-submitted regex for "evil" patterns prior to execution using one or more regex patterns (this could be in to form of a method called prior to conditional execution of the regex):
This regex:
\(.+\+\)[\+\*]
Will match:
(a+)+
(ab+)+
([a-zA-Z]+)*
This Regex:
\((.+)\|(\1\?|\1{2,})\)\+
Will match:
(a|aa)+
(a|a?)+
This Regex:
\(\.\*.\)\{\d{2,}\}
Will match:
(.*a){x} for x \> 10
I may be a bit naive wrt Regex and Regex DoS, but I can't help but think that a little pre-screening for known "evil" patterns would go a long way toward preventing issues at execution time, especially if the regex in question is an input provided by an end user. The patterns above are likely not refined enough, since I am far from an expert on regex. It is just food for thought, since everything else I have found out there seems to indicate it can't be done, and focuses on either putting a time-out on the regex engine, or limiting the number of iterations it is allowed to execute.