Python and hebrew encoding/decoding error

别等时光非礼了梦想. 提交于 2019-12-10 19:56:46

问题


I have sqlite database which I would like to insert values in Hebrew to

I am keep getting the following error :

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal
not in range(128)

my code is as following :

runsql(u'INSERT into personal values(%(ID)d,%(name)s)' % {'ID':1,'name':fabricate_hebrew_name()})

    def fabricate_hebrew_name():
        hebrew_names = [u'ירדן',u'יפה',u'תמי',u'ענת',u'רבקה',u'טלי',u'גינה',u'דנה',u'ימית',u'אלונה',u'אילן',u'אדם',u'חווה']
        return random.sample(names,1)[0].encode('utf-8')

note: runsql executing the query on the sqlite database fabricate_hebrew_name() should return a string which could be used in my SQL query. any help is much appreciated.


回答1:


You are passing the fabricated names into the string formatting parameter for a Unicode string. Ideally, the strings passed this way should also be Unicode.

But fabricate_hebrew_name isn't returning Unicode - it is returned UTF-8 encoded string, which isn't the same.

So, get rid of the call the encode('utf-8') and see whether that helps.

The next question is what type runsql is expecting. If it is expecting Unicode, no problem. If it is expecting an ASCII-encoded string, then you will have problems because the Hebrew is not ASCII. In the unlikely case it is expecting a UTF-8 encoded-string, then that is the time to convert it - after the substitution is done.

In another answer, Ignacio Vazquez-Abrams warns against string interpolation in queries. The concept here is that instead of doing the string substitution, using the % operator, you should generally use a parameterised query, and pass the Hebrew strings as parameters to it. This may have some advantages in query optimisation and security against SQL injection.

Example

# -*- coding: utf-8 -*-
import sqlite3

# create db in memory
conn = sqlite3.connect(":memory:")
cur = conn.cursor()
cur.execute("CREATE TABLE personal ("
            "id INTEGER PRIMARY KEY,"
            "name VARCHAR(42) NOT NULL)")

# insert random name
import random
fabricate_hebrew_name = lambda: random.choice([
    u'ירדן',u'יפה',u'תמי',u'ענת', u'רבקה',u'טלי',u'גינה',u'דנה',u'ימית',
    u'אלונה',u'אילן',u'אדם',u'חווה'])

cur.execute("INSERT INTO personal VALUES("
            "NULL, :name)", dict(name=fabricate_hebrew_name()))
conn.commit()

id, name = cur.execute("SELECT * FROM personal").fetchone()
print id, name
# -> 1 אלונה



回答2:


You should not encode manually, and you should not use string interpolation for queries.



来源:https://stackoverflow.com/questions/2828537/python-and-hebrew-encoding-decoding-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!