How can I safely validate an untrusted regex in Perl?

不问归期 提交于 2019-12-31 00:36:11

问题


This answer explains that to validate an arbitrary regular expression, one simply uses eval:

while (<>) {
    eval "qr/$_/;"
    print $@ ? "Not a valid regex: $@\n" : "That regex looks valid\n";
}

However, this strikes me as very unsafe, for what I hope are obvious reasons. Someone could input, say:

foo/; system('rm -rf /'); qr/

or whatever devious scheme they can devise.

The natural way to prevent such things is to escape special characters, but if I escape too many characters, I severely limit the usefulness of the regex in the first place. A strong argument can be made, I believe, that at least []{}()/-,.*?^$! and white space characters ought to be permitted (and probably others), un-escaped, in a user regex interface, for the regexes to have minimal usefulness.

Is it possible to secure myself from regex injection, without limiting the usefulness of the regex language?


回答1:


The solution is simply to change

eval("qr/$_/")

to

eval("qr/\$_/")

This can be written more clearly as follows:

eval('qr/$_/')

But that's still not optimal. The following would be far better as it doesn't involve generating and compiling Perl code at run-time:

eval { qr/$_/ }

Note that neither solution protects you from denial of service attacks. It's quite easy to write a pattern that will take longer than the life of the universe to complete. To hand that situation, yYou could execute the regex match in a child for which CPU ulimit has been set.




回答2:


There is some discussion about this over at The Monastery.

TLDR: use re::engine::RE2 (-strict => 1);

Make sure at add (-strict => 1) to your use statement or re::engine::RE2 will fall back to perl's re.

The following is a quote from junyer, owner of the project on github.

RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.



来源:https://stackoverflow.com/questions/20357755/how-can-i-safely-validate-an-untrusted-regex-in-perl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!