Parsing email with Python

后端 未结 3 2027
花落未央
花落未央 2020-12-03 11:25

I\'m writing a Python script to process emails returned from Procmail. As suggested in this question, I\'m using the following Procmail config:

:0:
|$HOME/pr         


        
相关标签:
3条回答
  • 2020-12-03 12:03

    It looks like you have linefeeds without spaces prepended to the additional lines, which according to RFC 2822 §2.3.2 is illegal:

    Each header field is logically a single line of characters comprising
    the field name, the colon, and the field body. For convenience
    however, and to deal with the 998/78 character limitations per line,
    the field body portion of a header field can be split into a multiple
    line representation; this is called "folding". The general rule is
    that wherever this standard allows for folding white space (not
    simply WSP characters), a CRLF may be inserted before any WSP. For
    example, the header field:

        Subject: This is a test
    

    can be represented as:

        Subject: This
         is a test
    

    It should look something like this:

    From hostname Tue Jun 15 21:43:30 2010
    Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
    Received: from mail-fx0-f44.google.com (209.85.161.44)
        by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
    Received: by fxm19 with SMTP id 19so170709fxm.3
        for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
    MIME-Version: 1.0
    Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
        Jun 2010 18:47:33 -0700 (PDT)
    Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
    Date: Tue, 15 Jun 2010 20:47:33 -0500
    Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
    Subject: TEST 12
    From: Full Name <username@sender.com>
    To: username@domain.com
    Content-Type: text/plain; charset=ISO-8859-1
    
    ONE
    TWO
    THREE
    
    0 讨论(0)
  • 2020-12-03 12:08

    I answer to myself.

    I found a bug in the code that builds the messages. It's appending linebreaks between some lines, preventing the parser from working properly.

    0 讨论(0)
  • 2020-12-03 12:12

    You must ensure that the lines are not accidentally broken (as they are above, though it's hard to say if that was a copy-paste problem) -- with an intact message such as:

    Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
    Received: from mail-fx0-f44.google.com (209.85.161.44) by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
    Received: by fxm19 with SMTP id 19so170709fxm.3 for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
    MIME-Version: 1.0
    Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
    Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
    Date: Tue, 15 Jun 2010 20:47:33 -0500
    Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
    Subject: TEST 12
    From: Full Name <username@sender.com>
    To: username@domain.com
    Content-Type: text/plain; charset=ISO-8859-1
    
    ONE
    TWO
    THREE
    

    then

    msg = email.message_from_string(msgtxt)
    print msg['Subject']
    

    prints TEST 12 as desired.

    0 讨论(0)
提交回复
热议问题