Properly getting blobs from mysql database with mysql connector in python

三世轮回 提交于 2020-05-13 14:48:06

问题


When executing the following code:

import mysql.connector
connection = mysql.connector.connect(...) # connection params here
cursor = connection.cursor()
cursor.execute('create table test_table(value blob)')
cursor.execute('insert into test_table values (_binary %s)', (np.random.sample(10000).astype('float').tobytes(),))
cursor.execute('select * from test_table')
cursor.fetchall()

I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 1: invalid start byte

(...and then a stack trace which I don't think is useful here)

It seems that mysql connector converts my blob to string (and fails to do so). How can I fetch this data as bytes without any conversion?


回答1:


Apparently, this is a known issue with the Python 'mysql' module. Try to use 'pymysql' instead.




回答2:


We ran into the same issue that BLOBs were mistakenly read back as UTF-8 strings with MySQL 8.0.13, mysql-connector-python 8.0.13 and sqlalchemy 1.2.14.

What did the trick for us was enabling the use_pure option of MySQL Connector. The default of use_pure had changed in 8.0.11 with the new default being to use the C Extension. Thus, we set back the option:

create_engine(uri, connect_args={'use_pure': True}, ...)

Details of our error and stack trace:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 1: invalid start byte
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
    ....
    File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 272, in execute
        self._handle_result(result)
    File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 163, in _handle_result
        self._handle_resultset()
    File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 651, in _handle_resultset
        self._rows = self._cnx.get_rows()[0]
    File "/usr/local/lib/python3.6/site-packages/mysql/connector/connection_cext.py", line 273, in get_rows
        row = self._cmysql.fetch_row()
    SystemError: <built-in method fetch_row of _mysql_connector.MySQL object at 0x5627dcfdf9f0> returned a result with an error set



回答3:


I reproduced above error:

Traceback (most recent call last):
File "demo.py", line 16, in <module>
    cursor.execute(query, ())
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte '0xff ... ' 
in position 0: invalid start byte

Using versions:

$  python --version
Python 2.7.10

>>> mysql.connector.__version__
'8.0.15'

With python code

#!/usr/bin/python
# -*- coding: utf-8 -*-
import mysql.connector
conn = mysql.connector.connect(
      user='asdf', 
      password='asdf',
      host='1.2.3.4',
      database='the_db',
      connect_timeout=10)

cursor = conn.cursor(buffered=True)                     #error is raised here
try:
    query = ("SELECT data_blob FROM blog.cmd_table")
    cursor.execute(query, ())                         
except mysql.connector.Error as err:                    #error is caught here
    #error is caught here, and printed:
    print(err)                                          #printed thustly

Using a python variable "raw byte binary" populated by python's open( like this:

def read_file_as_blob(filename):
    #r stands for read
    #b stands for binary
    with open(filename, 'rb') as f:
        data = f.read()
    return data

So the problem is somewhere between the encoding transform of data in the file -> the encoding of data for mysql blob -> and how mysql lifts that blob and converts it back to utf-8.

Two solutions:

Solution 1 is exactly as AHalvar said, set use_pure=True parameter and pass to mysql.connector.connect( ... ). Then mysteriously it just works. But good programmers will note that deferring to mysterious incantation is a bad code smell. Fixes by brownian motion incur technical debt.

Solution 2 is to encode your data early and often, and prevent double re-encoding and double data decoding which is the source of these problems. Lock it down to a common encoding format as soon as possible.

The gratifying solution for me was forcing utf-8 encoding earlier in the process. Enforcing UTF-8 everywhere.

data.encode('UTF-8')

The unicode pile of poo represents my opinion on such babysitting of character encoding between various devices on different operating systems and encoding schemes.



来源:https://stackoverflow.com/questions/52759667/properly-getting-blobs-from-mysql-database-with-mysql-connector-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!