How do I use Python 3.2 email module to send unicode messages encoded in utf-8 with quoted-printable?

血红的双手。 提交于 2019-12-30 04:40:09

问题


I want to send email messages that have arbitrary unicode bodies in a Python 3.2 program. But, in reality, these messages will consist largely of 7bit ASCII text. So I would like the messages encoded in utf-8 using quoted-printable. So far, I've found this works, but it seems wrong:

c = email.charset.Charset('utf-8')
c.body_encoding = email.charset.QP
m = email.message.Message()
m.set_payload("My message with an '\u05d0' in it.".encode('utf-8').decode('iso8859-1'), c)

This results in an email message with exactly the right content:

To: someone@example.com
From: someone_else@example.com
Subject: This is a subjective subject.
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

My message with an '=D7=90' in it.

In particular b'\xd7\x90'.decode('utf-8') results in the original Unicode character. So the quoted-printable encoding is properly rendering the utf-8. I'm well-aware that this is an incredibly ugly hack. But it works.

This is Python 3. Text strings are expected to always be unicode. I shouldn't have to decode it to utf-8. And then turning it from bytes back into str by .decode('iso8859-1') is a horrible hack, and I shouldn't have to do that either.

It the email module just broken with respect to encodings? Am I not getting something?

I've attempted to just plain old set it, with no character set. That leaves me with a unicode email message, and that's not right at all. I've also tried leaving off the encode and decode steps. If I leave them both off, it complains that the \u05d0 is out-of-range when trying to decide if that character needs to be quoted in the quoted-printable encoding. If I leave in just the encode step, it complains bitterly about how I'm passing in a bytes and it wants a str.


回答1:


That email package isn't confused about which is which (encoded unicode versus content-transfer-encoded binary data), but the documentation does not make it very clear, since much of the documentation dates from an era when "encoding" meant content-transfer-encoding. We're working on a better API that will make all this easier to grok (and better docs).

There actually is a way to get the email package to use QP for utf-8 bodies, but it isn't very well documented. You do it like this:

>>> charset.add_charset('utf-8', charset.QP, charset.QP)
>>> m = MIMEText("This is utf-8 text: á", _charset='utf-8')
>>> str(m)
'Content-Type: text/plain; charset="utf-8"\nMIME-Version: 1.0\nContent-Transfer-Encoding: quoted-printable\n\nThis is utf-8 text: =E1'



回答2:


Running

import email
import email.charset
import email.message

c = email.charset.Charset('utf-8')
c.body_encoding = email.charset.QP
m = email.message.Message()
m.set_payload("My message with an '\u05d0' in it.", c)
print(m.as_string())

Yields this traceback message:

  File "/usr/lib/python3.2/email/quoprimime.py", line 81, in body_check
    return chr(octet) != _QUOPRI_BODY_MAP[octet]
KeyError: 1488

Since

In [11]: int('5d0',16)
Out[11]: 1488

it is clear that the unicode '\u05d0' is the problem character. _QUOPRI_BODY_MAP is defined in quoprimime.py by

_QUOPRI_HEADER_MAP = dict((c, '=%02X' % c) for c in range(256))
_QUOPRI_BODY_MAP = _QUOPRI_HEADER_MAP.copy()

This dict only contains keys from range(256). So I think you are right; quoprimime.py can not be used to encode arbitrary unicode.

As a workaround, you could use (the default) base64 by omitting

c.body_encoding = email.charset.QP

Note that the latest version of quoprimime.py does not use _QUOPRI_BODY_MAP at all, so using the latest Python might fix the problem.



来源:https://stackoverflow.com/questions/9403265/how-do-i-use-python-3-2-email-module-to-send-unicode-messages-encoded-in-utf-8-w

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!