Is a BLOB converted using the current/default charset in MySQL?

前端 未结 3 1424
孤独总比滥情好
孤独总比滥情好 2020-12-06 19:41
  1. I have a table with a BLOB field.
  2. The charset of the table is Latin1.
  3. I connect to the DB and \"SET CHARACTER SET utf8\".
  4. Then I save binary
相关标签:
3条回答
  • 2020-12-06 19:48

    Short Answer:

    Simply delete or comment out line below, and it will always work, no matter which database encoding is really in use (utf8, latin1, etc):

    $pdo->exec('SET CHARACTER SET utf8');
    

    Long Answer:

    This is not PDO bug, this is MySQL bug.

    When actual database encoding is latin1, but you use:

    SET CHARACTER SET utf8
    

    (or vice versa: actual is utf8, but you use latin1 - important part is that it is different), then, as far as I can tell, MySQL will try to perform charset conversion for all traffic between client and server (even for BLOB!).

    If you DO NOT use SET CHARACTER SET statement, from what I see for scripts (PHP/PDO or Perl/DBI) connection charset by default is set to be the database charset, and in that case no implicit conversion takes place.

    Obviously, this automatic conversion is what kills BLOBs, which do not want any conversion to happen.

    I have tested this on both PHP/PDO and Perl/DBI, and issue is easily reproducible: both will fail if using database with latin1 encoding and using SET CHARACTER SET utf8 (or vice versa).

    If you want to be fully UTF8 compatible, you should change encoding of your database using:

    ALTER DATABASE mydb CHARSET utf8;
    

    With this, everything will be using UTF8, and BLOBs will also work fine.

    Minimal file that causes this corruption problem is blob.bin with single byte 0xFF. On Linux, you can create this test file using printf command:

    printf "0xFF" > blob.bin
    

    Now, test scripts that reproduce the problem:

    PHP test code:

    <?php
    $dbh = new PDO("mysql:host=127.0.0.1;dbname=test");
    # If database encoding is NOT utf8, uncomment to break it:
    # $dbh->exec("SET CHARACTER SET utf8");
    
    $blob1 = file_get_contents("blob.bin");
    $sth = $dbh->prepare(
        "INSERT INTO pdo_blob (the_blob) VALUES(:the_blob)"
    );
    $sth->bindParam(":the_blob", $blob1, PDO::PARAM_LOB);
    $sth->execute();
    
    $sth = $dbh->prepare(
        "SELECT the_blob FROM pdo_blob ORDER BY id DESC LIMIT 1"
    );
    $sth->execute();
    
    $blob2 = null;
    $sth->bindColumn(1, $blob2, PDO::PARAM_LOB);
    $sth->fetch();
    
    if ($blob1 == $blob2) {
        echo "Equal\n";
    } else {
        echo "Not equal\n";
        $arr1 = str_split($blob1);
        $arr2 = str_split($blob2);
        $i=0;
        for ($i=0; $i<count($arr1); $i++) {
            if ($arr1[$i] != $arr2[$i]) {
                echo "First diff: " . dechex(ord($arr1[$i])) . " != "
                                    . dechex(ord($arr2[$i])) . "\n";
                break;
            }
        }
    }
    ?>
    

    Perl test code:

    #!/usr/bin/perl -w
    
    use strict;
    use DBI qw(:sql_types);
    
    my $dbh = DBI->connect("dbi:mysql:host=127.0.0.1;dbname=test");
    # If database encoding is NOT utf8, uncomment to break it:
    # $dbh->do("SET CHARACTER SET utf8");
    open FILE, "blob.bin";
    binmode FILE;
    read(FILE, my $blob1, 100000000);
    close FILE;
    my $sth = $dbh->prepare(
        "INSERT INTO pdo_blob (the_blob) VALUES(?)"
    );
    $sth->bind_param(1, $blob1, SQL_BLOB);
    $sth->execute();
    my ($blob2) = $dbh->selectrow_array(
        "SELECT the_blob FROM pdo_blob ORDER BY id DESC LIMIT 1"
    );
    print ($blob1 eq $blob2 ? "Equal" : "Not equal") , "\n";
    
    0 讨论(0)
  • 2020-12-06 19:51

    Edit: on WAMP-Server

    It didn't work with PDO API. You can use base64_encode() before insert and base64_decode() after retrieval. It bloats data by 33% and conversion is a overhead.

    If MySQLi APIs are an option then here is some code:

    <?php
    $mysqli = new mysqli('localhost', 'spark', 'spark123', 'test');
    
    $sql = "INSERT INTO blob_tb (bdata) VALUES(?)";
    $insertStm = $mysqli->prepare($sql);
    
    $blob = NULL; //necessary
    $insertStm->bind_param('b', $blob);
    
    $blob = (binary) (file_get_contents('favicon.ico'));
    $insertStm->send_long_data(0, $blob);
    
    $insertStm->execute();
    $insertStm->close();
    
    $selectStm = $mysqli->prepare("SELECT bdata FROM blob_tb LIMIT 1");
    $selectStm->execute();
    
    $selectStm->bind_result($savedBlob);
    $selectStm->fetch();
    $selectStm->close();
    
    $mysqli->close();
    
    echo 'equal: ' . ((int) ($blob == $savedBlob));
    // var_dump(($blob), strlen($blob));
    // var_dump(($savedBlob), strlen($savedBlob));
    // var_dump(get_defined_vars());
    
    ?>
    
    0 讨论(0)
  • 2020-12-06 20:08

    Good answer @mvp!

    But when my web app is UTF-8 and the database encoding is latin1, I have to set the character_set_client and character_set_results.

    When I use SET CHARACTER SET utf8, I got the described problem with BLOBs.

    But when I use SET NAMES utf8 instead it works!

    0 讨论(0)
提交回复
热议问题