Simulate php array language construct or parse with regexp?

后端 未结 3 601
[愿得一人]
[愿得一人] 2020-12-01 20:03

From an external source I\'m getting strings like

array(1,2,3)

but also a larger arrays like

array(\"a\", \"b\", \"c\", ar         


        
相关标签:
3条回答
  • 2020-12-01 20:41

    Whilst writing a parser using the Tokenizer which turned out not as easy as I expected, I came up with another idea: Why not parse the array using eval, but first validate that it contains nothing harmful?

    So, what the code does: It checks the tokens of the array against some allowed tokens and chars and then executes eval. I do hope I included all possible harmless tokens, if not, simply add them. (I intentionally didn't include HEREDOC and NOWDOC, because I think they are unlikely to be used.)

    function parseArray($code) {
        $allowedTokens = array(
            T_ARRAY                    => true,
            T_CONSTANT_ENCAPSED_STRING => true,
            T_LNUMBER                  => true,
            T_DNUMBER                  => true,
            T_DOUBLE_ARROW             => true,
            T_WHITESPACE               => true,
        );
        $allowedChars = array(
            '('                        => true,
            ')'                        => true,
            ','                        => true,
        );
    
        $tokens = token_get_all('<?php '.$code);
        array_shift($tokens); // remove opening php tag
    
        foreach ($tokens as $token) {
            // char token
            if (is_string($token)) {
                if (!isset($allowedChars[$token])) {
                    throw new Exception('Disallowed token \''.$token.'\' encountered.');
                }
                continue;
            }
    
            // array token
    
            // true, false and null are okay, too
            if ($token[0] == T_STRING && ($token[1] == 'true' || $token[1] == 'false' || $token[1] == 'null')) {
                continue;
            }
    
            if (!isset($allowedTokens[$token[0]])) {
                throw new Exception('Disallowed token \''.token_name($token[0]).'\' encountered.');
            }
        }
    
        // fetch error messages
        ob_start();
        if (false === eval('$returnArray = '.$code.';')) {
            throw new Exception('Array couldn\'t be eval()\'d: '.ob_get_clean());
        }
        else {
            ob_end_clean();
            return $returnArray;
        }
    }
    
    var_dump(parseArray('array("a", "b", "c", array("1", "2", array("A", "B")), array("3", "4"), "d")'));
    

    I think this is a good comprimise between security and convenience - no need to parse yourself.

    For example

    parseArray('exec("haha -i -thought -i -was -smart")');
    

    would throw exception:

    Disallowed token 'T_STRING' encountered.
    
    0 讨论(0)
  • 2020-12-01 20:48

    I think you should use the Tokenizer for this. Maybe I will write a script lateron, that actually does it.

    0 讨论(0)
  • 2020-12-01 20:57

    You could do:

    json_decode(str_replace(array('array(', ')'), array('[', ']'), $string)));
    

    Replace the array with square brackets. Then json_decode. If the string is just a multidimensional array with scalar values in it, then doing the str_replace will not break anything and you can json_decode it. If it contains any code, it will also replace the function brackets and then the Json won't be valid and NULL is returned.

    Granted, that's a rather, umm, creative approach, but might work for you.

    Edit: Also, see the comments for some further limitiations pointed out by other users.

    0 讨论(0)
提交回复
热议问题