I want to create a website where users can test regular expressions (there are many out there already...such as this one: http://www.pagecolumn.com/tool/pregtest.htm). Basic
If you allow user-submitted values for preg_replace
make sure you disallow the e flag! Not doing so could allow a malicious user to delete your entire site, or worse.
Otherwise, the worst thing that can happen is what the other answers already point out. Set a low script timeout, and maybe you should even make sure that the page can only be called X times per minute.
I think PHP itself will check the regex. Here's a sample script I made :
// check for input, and set max size of input
if(@!empty($_POST['regex'])
&& @!empty($_POST['text'])
&& strlen($_POST['regex'])<1000
&& strlen($_POST['text'])<2000
){
// set script timeout in case something goes wrong (SAFE MODE must be OFF)
$old_time=ini_get('max_execution_time');
if(!set_time_limit(1)) die('SAFE MODE MUST BE OFF'); // 1 sec is more then enough
// trim input, it's up to you to do more checks
$regex=trim($_POST['regex']);
// don't trim the text, it can be needed
$input=$_POST['text'];
// escape slashes
$regex=preg_replace('/([\\/]+)?//', '\/', $regex);
// go for the regex
if(false===$matched=@preg_match('/'.$regex.'/', $input, $matches)){
// regex was tested, show results
echo 'Matches: '.$matched.'<br />';
if($matched>0){
echo 'matches: <br />';
foreach($matches as $i => $match){
echo $i.' = '.$match.'<br />';
}
}
}
// set back original execution time
set_time_limit($old_time);
}
Anyways, NEVER EVER use eval() with user submitted strings.
Additionally, you can do some simple minimalistic sanitizing, but that's up to you. ;)
Afaik there are now "vulnerabilities" when trying to evaluate user-supplied regexps. The worst thing that could possibly happen is - like erik points out - a DOS attack or fatal error within your script.
I'm afraid to tell you that you won't be (even theoretically) able to "sanitize" every possible regexp out there. The best you can do is to check for lexical and/or syntactic errors.
The only problem I can think of is that someone can DOS you by entering a bad regex (one that is O(2^n) or O(n!) or whatever), and the easiest way to prevent this might be to set your page timeout short.
If the regex is being stored in a database, you should use whatever method you would normally use to escape the data, such as prepared statements.
Otherwise, my only concern is that the user could supply malicious regex in the sense that it could contain a mischeviously complex regex, and I'm not sure there is a way to check that.
One thought is that you could make your regex evaluator all client side by doing it in JS, but there are inconsistencies between php's preg functions and JavaScript regex functions.