Detecting if an email is a “Delivery Status Notification” and extract information - Python

后端 未结 3 1906
不思量自难忘°
不思量自难忘° 2021-02-13 18:35

I\'m using the Python email module to parse emails.

I need to be able to tell if an email is a \"Delivery Status Notification\", find out what the status is

相关标签:
3条回答
  • 2021-02-13 18:43

    The docs you cited says that the message is multi-part if it is DSN:

    import email
    
    msg = email.message_from_string(emailstr)
    
    if (msg.is_multipart() and len(msg.get_payload()) > 1 and 
        msg.get_payload(1).get_content_type() == 'message/delivery-status'):
        # email is DSN
        print(msg.get_payload(0).get_payload()) # human-readable section
    
        for dsn in msg.get_payload(1).get_payload():
            print('action: %s' % dsn['action']) # e.g., "failed", "delivered"
    
        if len(msg.get_payload()) > 2:
            print(msg.get_payload(2)) # original message
    

    Format of a Delivery Status Notification (from rfc 3464):

    A DSN is a MIME message with a top-level content-type of
    multipart/report (defined in [REPORT]).  When a multipart/report
    content is used to transmit a DSN:
    
    (a) The report-type parameter of the multipart/report content is
        "delivery-status".
    
    (b) The first component of the multipart/report contains a human-
        readable explanation of the DSN, as described in [REPORT].
    
    (c) The second component of the multipart/report is of content-type
        message/delivery-status, described in section 2.1 of this
        document.
    
    (d) If the original message or a portion of the message is to be
        returned to the sender, it appears as the third component of the
        multipart/report.
    
    0 讨论(0)
  • 2021-02-13 18:44

    I don't use Python but I suppose Gmail improved its support to DSN because my tests are successfull:

    You can see in the sample below this is a multipart message with "Content-Type: multipart/report; report-type=delivery-status".

    The way I identify reliably that it is a DSN:

    • The first row is "Return-path: <>"
    • Content-Type is "multipart/report" with "report-type=delivery-status"

    Then, I know that:

    • The report content is in the part with Content-Type = "message/delivery-status"
    • Status and Action fields are always present in the report content.
    • Note the Status field can be less precise than other status eventually present in the Diagnostic-Code field (not mandatory). However, the sample below is good (same status in all fields)
    • The original message is in the part with Content-Type = "message/rfc822". Sometimes, MTA returns only original message headers without content. In this case, Content-Type is "text/rfc822-headers".

    Sample DSN received after an e-mail sent to test-dsn-failure@gmail.com:

    Return-path: <>
    Received: from xxx ([xxx])
        by xxx with ESMTP; Fri, 04 May 2012 16:18:13 +0200
    From: <Mailer-Daemon@xxx> (Mail Delivery System)
    To: xxx
    Subject: Undelivered Mail Returned to Sender
    Date: Fri, 04 May 2012 15:25:09 +0200
    MIME-Version: 1.0
    Content-Type: multipart/report; report-type=delivery-status;
     boundary="HTB3nt3RR7vw/QMPR4kDPbKg+XWjXIKdC/rfHQ=="
    
    This is a MIME-encapsulated message.
    
    --HTB3nt3RR7vw/QMPR4kDPbKg+XWjXIKdC/rfHQ==
    Content-Description: Notification
    Content-Type: text/plain
    
    I'm sorry to have to inform you that your message could not
    be delivered to one or more recipients. It's attached below.
    
    For further assistance, please send mail to <postmaster@xxx>
    
    If you do so, please include this problem report. You can
    delete your own text from the attached returned message.
    
    <test-dsn-failure@gmail.com>: 550-5.1.1 The email account that you tried to reach does not exist. Please try
    550-5.1.1 double-checking the recipient's email address for typos or
    550-5.1.1 unnecessary spaces. Learn more at
    550 5.1.1 http://support.google.com/mail/bin/answer.py?answer=6596 t12si10077186weq.36
    
    
    --HTB3nt3RR7vw/QMPR4kDPbKg+XWjXIKdC/rfHQ==
    Content-Description: Delivery report
    Content-Type: message/delivery-status
    
    Reporting-MTA: dns; xxx
    Arrival-Date: Fri, 04 May 2012 15:25:09 +0200
    
    Final-Recipient: rfc822; test-dsn-failure@gmail.com
    Status: 5.1.1
    Action: failed
    Last-Attempt-Date: Fri, 04 May 2012 15:25:09 +0200
    Diagnostic-Code: smtp; 550-5.1.1 The email account that you tried to reach does not exist. Please try
    550-5.1.1 double-checking the recipient's email address for typos or
    550-5.1.1 unnecessary spaces. Learn more at
    550 5.1.1 http://support.google.com/mail/bin/answer.py?answer=6596 t12si10077186weq.36
    
    --HTB3nt3RR7vw/QMPR4kDPbKg+XWjXIKdC/rfHQ==
    Content-Description: Undelivered Message
    Content-Type: message/rfc822
    
    [original message...]
    
    0 讨论(0)
  • 2021-02-13 18:45

    The X-Failed-Recipients header seems to be the quickest way to identify gmail DSN. After that, it seems you must parse the text/plain content.

    0 讨论(0)
提交回复
热议问题