I have a hex string and i want to convert it utf8 to insert mysql. (my database is utf8)
hex_string = \'kitap ara\\xfet\\xfdrmas\\xfd\'
...
result = \'kitap
Assuming Python 2.6,
>>> print('kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9'))
kitap araştırması
>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9').encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'
String literals explains how to use UTF8 strings in Python source.
Try
hex_string.decode("cp1254").encode("utf-8")
(cp1254
or iso-8859-9
are the Turkish codepages, the former being the usual name on Windows platforms, but in Python, both work equally well)
Try(Python 3.x):
import codecs
codecs.decode("707974686f6e2d666f72756d2e696f", "hex").decode('utf-8')
From here.
First you need to decode it from the encoded bytes you have. That appears to be ISO-8859-9 (latin-5), or, if you are using Windows, probably code page 1254, which is based on latin-5.
>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('cp1254')
u'kitap ara\u015ft\u0131rmas\u0131' # u'kitap araştırması'
If you are using Windows, then depending on where you are getting those bytes, it might be more appropriate to decode them as mbcs
, which translates to ‘whichever code page the local system is using’. If the string is just sitting in a .py
file, you would be better off just writing u'kitap araştırması'
in the source and setting a -*- coding
declaration to direct Python to decode it. See PEP 263.
As to how to encode unicode strings to UTF-8 for the database, well, if you want to you can do it manually:
>>> u'kitap ara\u015ft\u0131rmas\u0131'.encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'
but a good data access layer is likely to do that automatically for you, if you've got the COLLATION
of the tables the data is going into right.