Method for parsing text Cc field of email header?

问题

I have the plain text of a Cc header field that looks like so:

friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>

Are there any battle tested modules for parsing this properly?

(bonus if it's in python! the email module just returns the raw text without any methods for splitting it, AFAIK) (also bonus if it splits name and address into to fields)

回答1:

There are a bunch of function available as a standard python module, but I think you're looking for email.utils.parseaddr() or email.utils.getaddresses()

>>> addresses = 'friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>'
>>> email.utils.getaddresses([addresses])
[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'), ('Smith, Jane', 'jane.smith@uconn.edu')]

回答2:

I haven't used it myself, but it looks to me like you could use the csv package quite easily to parse the data.

回答3:

The bellow is completely unnecessary. I wrote it before realising that you could pass getaddresses() a list containing a single string containing multiple addresses.

I haven't had a chance to look at the specifications for addresses in email headers, but based on the string you provided, this code should do the job splitting it into a list, making sure to ignore commas if they are within quotes (and therefore part of a name).

from email.utils import getaddresses

addrstring = ',friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>,'

def addrparser(addrstring):
    addrlist = ['']
    quoted = False

    # ignore comma at beginning or end
    addrstring = addrstring.strip(',')

    for char in addrstring:
        if char == '"':
            # toggle quoted mode
            quoted = not quoted
            addrlist[-1] += char
        # a comma outside of quotes means a new address
        elif char == ',' and not quoted:
            addrlist.append('')
        # anything else is the next letter of the current address
        else:
            addrlist[-1] += char

    return getaddresses(addrlist)

print addrparser(addrstring)

Gives:

[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'),
 ('Smith, Jane', 'jane.smith@uconn.edu')]

I'd be interested to see how other people would go about this problem!

回答4:

Convert multiple E-mail string in to dictionary (Multiple E-Mail with name in to one string).

emailstring = 'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>'

Split string by Comma

email_list = emailstring.split(',')

name is key and email is value and make dictionary.

email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list))

Result like this:

{'John Smith': 'john.smith@email.com', 'Friends': 'friend@email.com', 'Smith': 'jane.smith@uconn.edu'}

Note:

If there is same name with different email id then one record is skip.

'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>, Friends <friend_co@email.com>'

"Friends" is duplicate 2 time.

来源：https://stackoverflow.com/questions/5426789/method-for-parsing-text-cc-field-of-email-header

标签

python

parsing

email-headers