From an external source I\'m getting strings like
array(1,2,3)
but also a larger arrays like
array(\"a\", \"b\", \"c\", ar
Whilst writing a parser using the Tokenizer which turned out not as easy as I expected, I came up with another idea: Why not parse the array using eval
, but first validate that it contains nothing harmful?
So, what the code does: It checks the tokens of the array against some allowed tokens and chars and then executes eval. I do hope I included all possible harmless tokens, if not, simply add them. (I intentionally didn't include HEREDOC and NOWDOC, because I think they are unlikely to be used.)
function parseArray($code) {
$allowedTokens = array(
T_ARRAY => true,
T_CONSTANT_ENCAPSED_STRING => true,
T_LNUMBER => true,
T_DNUMBER => true,
T_DOUBLE_ARROW => true,
T_WHITESPACE => true,
);
$allowedChars = array(
'(' => true,
')' => true,
',' => true,
);
$tokens = token_get_all('<?php '.$code);
array_shift($tokens); // remove opening php tag
foreach ($tokens as $token) {
// char token
if (is_string($token)) {
if (!isset($allowedChars[$token])) {
throw new Exception('Disallowed token \''.$token.'\' encountered.');
}
continue;
}
// array token
// true, false and null are okay, too
if ($token[0] == T_STRING && ($token[1] == 'true' || $token[1] == 'false' || $token[1] == 'null')) {
continue;
}
if (!isset($allowedTokens[$token[0]])) {
throw new Exception('Disallowed token \''.token_name($token[0]).'\' encountered.');
}
}
// fetch error messages
ob_start();
if (false === eval('$returnArray = '.$code.';')) {
throw new Exception('Array couldn\'t be eval()\'d: '.ob_get_clean());
}
else {
ob_end_clean();
return $returnArray;
}
}
var_dump(parseArray('array("a", "b", "c", array("1", "2", array("A", "B")), array("3", "4"), "d")'));
I think this is a good comprimise between security and convenience - no need to parse yourself.
For example
parseArray('exec("haha -i -thought -i -was -smart")');
would throw exception:
Disallowed token 'T_STRING' encountered.
I think you should use the Tokenizer for this. Maybe I will write a script lateron, that actually does it.
You could do:
json_decode(str_replace(array('array(', ')'), array('[', ']'), $string)));
Replace the array with square brackets. Then json_decode
. If the string is just a multidimensional array with scalar values in it, then doing the str_replace
will not break anything and you can json_decode
it. If it contains any code, it will also replace the function brackets and then the Json won't be valid and NULL
is returned.
Granted, that's a rather, umm, creative approach, but might work for you.
Edit: Also, see the comments for some further limitiations pointed out by other users.