How to encode international characters in recipient names (NOT addresses) with smtplib.sendmail() in Python 3?

孤街浪徒 提交于 2019-12-13 03:16:15

问题


I'm using a standard smtplib.sendmail() call in my Python 3 program to send emails, as follows:

smtp_session.sendmail('The Sender <sender@domain.com>', ['The ÅÄÖ Recipient <recipient@domain.com>'], 'Simple test body here')

The SMTP session has already been successfully established prior to this code line being executed, and it also always works just fine as long as there are no "international characters" in the recipient name.

BUT, as soon as I include e.g. "ÅÄÖ" in the recipient name (which is even just 8-bit ASCII characters, not even "real unicode" or whatever), as can be seen above, the email just disappears and never reaches the sender, although no errors or exceptions are returned or raised by the sendmail() method, nor anything inside it (I have single-stepped it in a debugger while doing this).

I know for a fact that I can send emails with such characters in the recipient names through this exact same SMTP server of mine, using a normal email client program like Thunderbird, so I can only assume that this problem has something to do with some encoding or similar?

Also, the solution shouldn't be related to that mail_options=['SMTPUTF8'] thingy either, because the server just replies that it doesn't support this if I try to use it (and again, emails using these exact recipient names can still be sent through the exact same SMTP server with a normal email client like Thunderbird).

So, is there some simple solution based on using some kind of "MIME related" encoding or similar on the recipient strings that will solve this, or how can I otherwise send an email from Python with such a recipient name?


回答1:


Characters in SMTP headers are required to be printable ASCII, in the numeric range 33-126 inclusive. If you need to represent characters outside that range in an SMTP header then you must use the encoding method defined by RFC 2231 (which is an evolution of an earlier method defined by RFC 2047).

Historically in Python you would have used the Header class from the email.header module to build suitably-encoded headers. That's still available in Python 3, but in Python 3 the newer recommendation is to use the EmailMessage class from the email.message module to construct the entire message, and let it take care of encoding any headers that need special treatment.




回答2:


The argument to smtplib.sendmail() should not have human-readable labels, just the address terminus.

smtp_session.sendmail('sender@domain.com', ['recipient@domain.com'],
    'Simple test body here')

The email.headerregistry module in Python 3.6+ has a facility for extracting just the email terminus, by way of parsing structured headers into objects with attributes.

from email.headerregistry import AddressHeader

hdr = dict()
AddressHeader.parse('To: The ÅÄÖ Recipient <recipient@domain.com>', hdr)
for grp in hdr['groups']:
    for addr in grp.addresses:
        print('{0}@{1}'.format(addr.username, addr.domain))

(I really hope there is a less convoluted way to access this functionality but at the very least this produces the expected result.)

In the actual message, Python takes care of properly RFC2047-encoding any headers with Unicode content (if you use the correct methods from the email library to construct a prop0er MIME message); but this is pure presentation (RFC5322) not transport (RFC5321). So in the message itself you might see

From: The Sender <sender@domain.com>
To: The =?utf-8?Q?=C3=85=C3=84=C3=96_Recipient?= <recipient@domain.com>

though keep in mind that there is no requirement for the message content to actually reveal the transport sender or recipient headers. (Maybe tangentially see Header "To:" for a Bulk Email Sender)



来源:https://stackoverflow.com/questions/58253420/how-to-encode-international-characters-in-recipient-names-not-addresses-with-s

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!