问题
When I parse my email messages via python email.parser.Parser, I had a lot of strings like this:
=?ISO-8859-5?Q?=C0=D5=D5=E1=E2=E0_=BF=DB=D0=E2=D5=D6=D5=D9_?=
How can i decode this to utf-8 using python?
回答1:
Your input is quoted-printable encoded text. You can use the module quopri
to handle that:
import quopri
incode = '=?ISO-8859-5?Q?=C0=D5=D5=E1=E2=E0_=BF=DB=D0=E2=D5=D6=D5=D9_?='
inencoding = incode[2:12] # 'ISO-8859-5'
intext = incode[15:-2]
result = quopri.decodestring(intext).encode(inencoding)
Result will then be
Реестр_Платежей
Around the quoted-printable encoding you additionally have an email-header formating, specifying the character encoding the string should be interpreted in after applying the quoted-printable decoding. The example code above substrings the portions "manually", but you also can solve all that in one step:
import email
[ (text, encoding) ] = email.header.decode_header(incode)
result = text.decode(encoding)
result
now will again be the string given above.
来源:https://stackoverflow.com/questions/24080233/python-decoding-from-iso-8859-5