How to correctly insert utf-8 characters into a MySQL table using python

前端 未结 4 773
误落风尘
误落风尘 2021-02-05 16:21

I am extremely confused and puzzled by how I store strings with unusual characters (to someone who is used to dealing with a UK English character set) in them.

Here is m

相关标签:
4条回答
  • 2021-02-05 16:59
    <?php
    //Set Beginning of php code:
    header("Content-Type: text/html; charset=UTF-8");
    mysql_query("SET NAMES 'utf8'"); 
    mysql_query('SET CHARACTER SET utf8');
    
    //then create the connection 
    $CNN=mysql_connect("localhost","usr_urdu","123") or die('Unable to Connect');
    $DB=mysql_select_db('db_urdu',$CNN)or die('Unable to select DB');
    
    0 讨论(0)
  • 2021-02-05 17:05

    Set the default client character set:

    <?php
    $con=mysqli_connect("localhost","my_user","my_password","my_db");
    // Check connection
    if (mysqli_connect_errno())
      {
      echo "Failed to connect to MySQL: " . mysqli_connect_error();
      }
    
    // Change character set to utf8
    mysqli_set_charset($con,"utf8");
    mysqli_close($con);
    ?>
    
    0 讨论(0)
  • 2021-02-05 17:11

    Did you try, this query set names utf8;

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    import MySQLdb
    
    mystring = "Bientôt l'été"
    
    myinsert = [{ "name": mystring.encode("utf-8").strip()[:65535], "id": 1 }]
    
    con = MySQLdb.connect('localhost', 'abc', 'def', 'ghi');
    cur = con.cursor()
    
    cur.execute("set names utf8;")     # <--- add this line,
    
    sql = "INSERT INTO 'MyTable' ( 'my_id', 'my_name' ) VALUES ( %(id)s, %(name)s ) ; "
    cur.executemany( sql, myinsert )
    con.commit()
    if con: con.close()
    
    0 讨论(0)
  • 2021-02-05 17:20

    Your problem is with how you display the data when you read it from the database. You are looking at UTF-8 data mis-interpreted as Latin 1.

    >>> "Bient\xf4t l'\xe9t\xe9"
    "Bientôt l'été"
    >>> "Bient\xf4t l'\xe9t\xe9".encode('utf8').decode('latin1')
    "Bientôt l'été"
    

    The above encoded a unicode string to UTF-8, then misinterprets it as Latin 1 (ISO 8859-1), and the ô and é codepoints, which were encoded to two UTF-8 bytes each, are re-interpreted as two latin-1 code points each.

    Since you are running Python 2, you shouldn't need to .encode() already encoded data. It'd be better if you inserted unicode objects instead; so you want to decode instead:

    myinsert = [ { "name" : mystring.decode("utf-8").strip()[:65535], "id" : 1 } ]
    

    By calling .encode() on the encoded data, you are asking Python to first decode the data (using the default encoding) so that it then can encode for you. If the default on your python has been changed to latin1 you would see the same effect; UTF-8 data interpreted as Latin 1 before being re-encoded to Latin-1.

    You may want to read up on Python and Unicode:

    • The Python Unicode HOWTO

    • Pragmatic Unicode by Ned Batchelder

    • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

    0 讨论(0)
提交回复
热议问题