How to repair a serialized string which has been corrupted by an incorrect byte count length?

后端 未结 15 1721
后悔当初
后悔当初 2020-11-22 11:46

I am using Hotaru CMS with the Image Upload plugin, I get this error if I try to attach an image to a post, otherwise there is no error:

unserialize()

相关标签:
15条回答
  • 2020-11-22 12:26

    The corruption in this question is isolated to a single substring at the end of the serialized string with was probably manually replaced by someone who lazily wanted to update the image filename. This fact will be apparent in my demonstration link below using the OP's posted data -- in short, C:fakepath100.jpg does not have a length of 19, it should be 17.

    Since the serialized string corruption is limited to an incorrect byte/character count number, the following will do a fine job of updating the corrupted string with the correct byte count value.

    The following regex based replacement will only be effective in remedying byte counts, nothing more.

    It looks like many of the earlier posts are just copy-pasting a regex pattern from someone else. There is no reason to capture the potentially corrupted byte count number if it isn't going to be used in the replacement. Also, adding the s pattern modifier is a reasonable inclusion in case a string value contains newlines/line returns.

    *For those that are not aware of the treatment of multibyte characters with serializing, you must not use mb_strlen() in the custom callback because it is the byte count that is stored not the character count, see my output...

    Code: (Demo with OP's data) (Demo with arbitrary sample data) (Demo with condition replacing)

    $corrupted = <<<STRING
    a:4:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
    newline2";i:3;s:6:"garçon";}
    STRING;
    
    $repaired = preg_replace_callback(
            '/s:\d+:"(.*?)";/s',
            //  ^^^- matched/consumed but not captured because not used in replacement
            function ($m) {
                return "s:" . strlen($m[1]) . ":\"{$m[1]}\";";
            },
            $corrupted
        );
    
    echo $corrupted , "\n" , $repaired;
    echo "\n---\n";
    var_export(unserialize($repaired));
    

    Output:

    a:4:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
    Newline2";i:3;s:6:"garçon";}
    a:4:{i:0;s:5:"three";i:1;s:4:"five";i:2;s:17:"newline1
    Newline2";i:3;s:7:"garçon";}
    ---
    array (
      0 => 'three',
      1 => 'five',
      2 => 'newline1
    Newline2',
      3 => 'garçon',
    )
    

    One leg down the rabbit hole... The above works fine even if double quotes occur in a string value, but if a string value contains "; or some other monkeywrenching sbustring, you'll need to go a little further and implement "lookarounds". My new pattern

    checks that the leading s is:

    • the start of the entire input string or
    • preceded by ;

    and checks that the "; is:

    • at the end of the entire input string or
    • followed by } or
    • followed by a string or integer declaration s: or i:

    I haven't test each and every possibility; in fact, I am relatively unfamiliar with all of the possibilities in a serialized string because I never elect to work with serialized data -- always json in modern applications. If there are additional possible leading or trailing characters, leave a comment and I'll extend the lookarounds.

    Extended snippet: (Demo)

    $corrupted_byte_counts = <<<STRING
    a:12:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
    newline2";i:3;s:6:"garçon";i:4;s:111:"double " quote \"escaped";i:5;s:1:"a,comma";i:6;s:9:"a:colon";i:7;s:0:"single 'quote";i:8;s:999:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:1:"monkey";wrenching doublequote-semicolon";s:3:"s:";s:9:"val s: val";}
    STRING;
    
    $repaired = preg_replace_callback(
            '/(?<=^|;)s:\d+:"(.*?)";(?=$|}|[si]:)/s',
            //^^^^^^^^--------------^^^^^^^^^^^^^-- some additional validation
            function ($m) {
                return 's:' . strlen($m[1]) . ":\"{$m[1]}\";";
            },
            $corrupted_byte_counts
        );
    
    echo "corrupted serialized array:\n$corrupted_byte_counts";
    echo "\n---\n";
    echo "repaired serialized array:\n$repaired";
    echo "\n---\n";
    print_r(unserialize($repaired));
    

    Output:

    corrupted serialized array:
    a:12:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
    newline2";i:3;s:6:"garçon";i:4;s:111:"double " quote \"escaped";i:5;s:1:"a,comma";i:6;s:9:"a:colon";i:7;s:0:"single 'quote";i:8;s:999:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:1:"monkey";wrenching doublequote-semicolon";s:3:"s:";s:9:"val s: val";}
    ---
    repaired serialized array:
    a:12:{i:0;s:5:"three";i:1;s:4:"five";i:2;s:17:"newline1
    newline2";i:3;s:7:"garçon";i:4;s:24:"double " quote \"escaped";i:5;s:7:"a,comma";i:6;s:7:"a:colon";i:7;s:13:"single 'quote";i:8;s:10:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:39:"monkey";wrenching doublequote-semicolon";s:2:"s:";s:10:"val s: val";}
    ---
    Array
    (
        [0] => three
        [1] => five
        [2] => newline1
    newline2
        [3] => garçon
        [4] => double " quote \"escaped
        [5] => a,comma
        [6] => a:colon
        [7] => single 'quote
        [8] => semi;colon
        [assoc] => yes
        [9] => monkey";wrenching doublequote-semicolon
        [s:] => val s: val
    )
    
    0 讨论(0)
  • 2020-11-22 12:28

    the official docs says it should return false and set E_NOTICE

    but since you got error then the error reporting is set to be triggered by E_NOTICE

    here is a fix to allow you detect false returned by unserialize

    $old_err=error_reporting(); 
    error_reporting($old_err & ~E_NOTICE);
    $object = unserialize($serialized_data);
    error_reporting($old_err);
    

    you might want to consider use base64 encode/decode

    $string=base64_encode(serialize($obj));
    unserialize(base64_decode($string));
    
    0 讨论(0)
  • 2020-11-22 12:28

    Here is an Online Tool for fixing a corrupted serialized string.

    I'd like to add that this mostly happens due to a search and replace done on the DB and the serialization data(specially the key length) doesn't get updated as per the replace and that causes the "corruption".

    Nonetheless, The above tool uses the following logic to fix the serialization data (Copied From Here).

    function error_correction_serialise($string){
        // at first, check if "fixing" is really needed at all. After that, security checkup.
        if ( unserialize($string) !== true &&  preg_match('/^[aOs]:/', $string) ) {
             $string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s',    function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; },   $string );
        }
        return $string;
    } 
    
    0 讨论(0)
  • 2020-11-22 12:31

    There's another reason unserialize() failed because you improperly put serialized data into the database see Official Explanation here. Since serialize() returns binary data and php variables don't care encoding methods, so that putting it into TEXT, VARCHAR() will cause this error.

    Solution: store serialized data into BLOB in your table.

    0 讨论(0)
  • 2020-11-22 12:31

    Another reason of this problem can be column type of "payload" sessions table. If you have huge data on session, a text column wouldn't be enough. You will need MEDIUMTEXT or even LONGTEXT.

    0 讨论(0)
  • 2020-11-22 12:33

    This error is caused because your charset is wrong.

    Set charset after open tag:

    header('Content-Type: text/html; charset=utf-8');
    

    And set charset utf8 in your database :

    mysql_query("SET NAMES 'utf8'");
    
    0 讨论(0)
提交回复
热议问题