Regex/ code to fix corrupt serialized PHP data.

前端 未结 12 679
夕颜
夕颜 2020-11-30 07:04

I have a massive multidimensional array that has been serialised by PHP. It has been stored in MySQL and the data field wasn\'t large enough... the end has been cut off... I

相关标签:
12条回答
  • 2020-11-30 07:13

    Following snippet will attempt to read & parse recursively damaged serialized string (blob data). For example if you stored into database column string too long and it got cut off. Numeric primitives and bool are guaranteed to be valid, strings may be cut off and/or array keys may be missing. The routine may be useful e.g. if recovering significant (not all) part of data is sufficient solution to you.

    class Unserializer
    {
        /**
        * Parse blob string tolerating corrupted strings & arrays
        * @param string $str Corrupted blob string
        */
        public static function parseCorruptedBlob(&$str)
        {
            // array pattern:    a:236:{...;}
            // integer pattern:  i:123;
            // double pattern:   d:329.0001122;
            // boolean pattern:  b:1; or b:0;
            // string pattern:   s:14:"date_departure";
            // null pattern:     N;
            // not supported: object O:{...}, reference R:{...}
    
            // NOTES:
            // - primitive types (bool, int, float) except for string are guaranteed uncorrupted
            // - arrays are tolerant to corrupted keys/values
            // - references & objects are not supported
            // - we use single byte string length calculation (strlen rather than mb_strlen) since source string is ISO-8859-2, not utf-8
    
            if(preg_match('/^a:(\d+):{/', $str, $match)){
                list($pattern, $cntItems) = $match;
                $str = substr($str, strlen($pattern));
                $array = [];
                for($i=0; $i<$cntItems; ++$i){
                    $key = self::parseCorruptedBlob($str);
                    if(trim($key)!==''){ // hmm, we wont allow null and "" as keys..
                        $array[$key] = self::parseCorruptedBlob($str);
                    }
                }
                $str = ltrim($str, '}'); // closing array bracket
                return $array;
            }elseif(preg_match('/^s:(\d+):/', $str, $match)){
                list($pattern, $length) = $match;
                $str = substr($str, strlen($pattern));
                $val = substr($str, 0, $length + 2); // include also surrounding double quotes
                $str = substr($str, strlen($val) + 1); // include also semicolon
                $val = trim($val, '"'); // remove surrounding double quotes
                if(preg_match('/^a:(\d+):{/', $val)){
                    // parse instantly another serialized array
                    return (array) self::parseCorruptedBlob($val);
                }else{
                    return (string) $val;
                }
            }elseif(preg_match('/^i:(\d+);/', $str, $match)){
                list($pattern, $val) = $match;
                $str = substr($str, strlen($pattern));
                return (int) $val;
            }elseif(preg_match('/^d:([\d.]+);/', $str, $match)){
                list($pattern, $val) = $match;
                $str = substr($str, strlen($pattern));
                return (float) $val;
            }elseif(preg_match('/^b:(0|1);/', $str, $match)){
                list($pattern, $val) = $match;
                $str = substr($str, strlen($pattern));
                return (bool) $val;
            }elseif(preg_match('/^N;/', $str, $match)){
                $str = substr($str, strlen('N;'));
                return null;
            }
        }
    }
    
    // usage:
    $unserialized = Unserializer::parseCorruptedBlob($serializedString);
    
    0 讨论(0)
  • 2020-11-30 07:14

    You can return invalid serialized data back to normal, by way of an array :)

    str = "a:1:{i:0;a:4:{s:4:\"name\";s:26:\"20141023_544909d85b868.rar\";s:5:\"dname\";s:20:\"HTxRcEBC0JFRWhtk.rar\";s:4:\"size\";i:19935;s:4:\"dead\";i:0;}}"; 
    
    preg_match_all($re, $str, $matches);
    
    if(is_array($matches) && !empty($matches[1]) && !empty($matches[2]))
    {
        foreach($matches[1] as $ksel => $serv)
        {
            if(!empty($serv))
            {
                $retva[] = $serv;
            }else{
                $retva[] = $matches[2][$ksel];
            }
        }
    
        $count = 0;
        $arrk = array();
        $arrv = array();
        if(is_array($retva))
        {
            foreach($retva as $k => $va)
            {
                ++$count;
                if($count/2 == 1)
                {
                    $arrv[] = $va;
                    $count = 0;
                }else{
                    $arrk[] = $va;
                }
            }
            $returnse = array_combine($arrk,$arrv);
        }
    
    }
    
    print_r($returnse);
    
    0 讨论(0)
  • 2020-11-30 07:18

    Solution:

    1) try online:

    Serialized String Fixer (online tool)

    2) Use function:

    unserialize( serialize_corrector($serialized_string ) ) ;

    code:

    function serialize_corrector($serialized_string){
        // at first, check if "fixing" is really needed at all. After that, security checkup.
        if ( @unserialize($serialized_string) !== true &&  preg_match('/^[aOs]:/', $serialized_string) ) {
            $serialized_string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s',    function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; },   $serialized_string );
        }
        return $serialized_string;
    } 
    

    there is also this script, which i haven't tested.

    0 讨论(0)
  • 2020-11-30 07:20

    I think this is almost impossible. Before you can repair your array you need to know how it is damaged. How many childs missing? What was the content?

    Sorry imho you can't do it.

    Proof:

    <?php
    
    $serialized = serialize(
        [
            'one'   => 1,
            'two'   => 'nice',
            'three' => 'will be damaged'
        ]
    );
    
    var_dump($serialized); // a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:15:"will be damaged";}
    
    var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"tee";s:15:"will be damaged";}')); // please note 'tee'
    
    var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:')); // serialized string is truncated
    

    Link: https://ideone.com/uvISQu

    Even if you can recalculate length of your keys/values, you cannot trust the data retrieved from this source, because you cannot recalculate the value of these. Eg. if the serialized data is an object, your properties won't be accessible anymore.

    0 讨论(0)
  • 2020-11-30 07:22

    [UPD] Colleagues, I'm not very sure if it is allowed here, but specially for similar cases I've created own tool and 've placed it on own website. Please, try it https://saysimsim.ru/tools/SerializedDataEditor

    [Old text] Conclusion :-) After 3 days (instead of 2 estimated hours) migrating blessed WordPress website to a new domain name, I've finally found this page!!! Colleagues, please, consider it as my "Thank_You_Very_Much_Indeed" to all your answers. The code below consists of all your solutions with almost no additions. JFYI: personally for me the most often SOLUTION 3 works. Kamal Saleh - you are the best!!!

    function hlpSuperUnSerialize($str) {
        #region Simple Security
        if (
            empty($str)
            || !is_string($str)
            || !preg_match('/^[aOs]:/', $str)
        ) {
            return FALSE;
        }
        #endregion Simple Security
    
        #region SOLUTION 0
        // PHP default :-)
        $repSolNum = 0;
        $strFixed  = $str;
        $arr       = @unserialize($strFixed);
        if (FALSE !== $arr) {
            error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
    
            return $arr;
        }
        #endregion SOLUTION 0
    
        #region SOLUTION 1
        // @link https://stackoverflow.com/a/5581004/3142281
        $repSolNum = 1;
        $strFixed  = preg_replace_callback(
            '/s:([0-9]+):\"(.*?)\";/',
            function ($matches) { return "s:" . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
            $str
        );
        $arr       = @unserialize($strFixed);
        if (FALSE !== $arr) {
            error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
    
            return $arr;
        }
        #endregion SOLUTION 1
    
        #region SOLUTION 2
        // @link https://stackoverflow.com/a/24995701/3142281
        $repSolNum = 2;
        $strFixed  = preg_replace_callback(
            '/s:([0-9]+):\"(.*?)\";/',
            function ($match) {
                return "s:" . strlen($match[2]) . ':"' . $match[2] . '";';
            },
            $str);
        $arr       = @unserialize($strFixed);
        if (FALSE !== $arr) {
            error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
    
            return $arr;
        }
        #endregion SOLUTION 2
    
        #region SOLUTION 3
        // @link https://stackoverflow.com/a/34224433/3142281
        $repSolNum = 3;
        // securities
        $strFixed = preg_replace("%\n%", "", $str);
        // doublequote exploding
        $data     = preg_replace('%";%', "µµµ", $strFixed);
        $tab      = explode("µµµ", $data);
        $new_data = '';
        foreach ($tab as $line) {
            $new_data .= preg_replace_callback(
                '%\bs:(\d+):"(.*)%',
                function ($matches) {
                    $string       = $matches[2];
                    $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
    
                    return 's:' . $right_length . ':"' . $string . '";';
                },
                $line);
        }
        $strFixed = $new_data;
        $arr      = @unserialize($strFixed);
        if (FALSE !== $arr) {
            error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
    
            return $arr;
        }
        #endregion SOLUTION 3
    
        #region SOLUTION 4
        // @link https://stackoverflow.com/a/36454402/3142281
        $repSolNum = 4;
        $strFixed  = preg_replace_callback(
            '/s:([0-9]+):"(.*?)";/',
            function ($match) {
                return "s:" . strlen($match[2]) . ":\"" . $match[2] . "\";";
            },
            $str
        );
        $arr       = @unserialize($strFixed);
        if (FALSE !== $arr) {
            error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
    
            return $arr;
        }
        #endregion SOLUTION 4
    
        #region SOLUTION 5
        // @link https://stackoverflow.com/a/38890855/3142281
        $repSolNum = 5;
        $strFixed  = preg_replace_callback('/s\:(\d+)\:\"(.*?)\";/s', function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; }, $str);
        $arr       = @unserialize($strFixed);
        if (FALSE !== $arr) {
            error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
    
            return $arr;
        }
        #endregion SOLUTION 5
    
        #region SOLUTION 6
        // @link https://stackoverflow.com/a/38891026/3142281
        $repSolNum = 6;
        $strFixed  = preg_replace_callback(
            '/s\:(\d+)\:\"(.*?)\";/s',
            function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
            $str);;
        $arr = @unserialize($strFixed);
        if (FALSE !== $arr) {
            error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
    
            return $arr;
        }
        #endregion SOLUTION 6
        error_log('Completely unable to deserialize.');
    
        return FALSE;
    }
    
    0 讨论(0)
  • 2020-11-30 07:24

    Using preg_replace_callback(), instead of preg_replace(.../e) (because /e modifier is deprecated).

    $fixed_serialized_String = preg_replace_callback('/s:([0-9]+):\"(.*?)\";/',function($match) {
        return "s:".strlen($match[2]).':"'.$match[2].'";';
    }, $serializedString);
    
    $correct_array= unserialize($fixed_serialized_String);
    
    0 讨论(0)
提交回复
热议问题