I have the plain text of a Cc header field that looks like so:
friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>
Are there any battle tested modules for parsing this properly?
(bonus if it's in python! the email module just returns the raw text without any methods for splitting it, AFAIK) (also bonus if it splits name and address into to fields)
There are a bunch of function available as a standard python module, but I think you're looking for email.utils.parseaddr() or email.utils.getaddresses()
>>> addresses = 'friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>'
>>> email.utils.getaddresses([addresses])
[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'), ('Smith, Jane', 'jane.smith@uconn.edu')]
I haven't used it myself, but it looks to me like you could use the csv package quite easily to parse the data.
The bellow is completely unnecessary. I wrote it before realising that you could pass getaddresses()
a list containing a single string containing multiple addresses.
I haven't had a chance to look at the specifications for addresses in email headers, but based on the string you provided, this code should do the job splitting it into a list, making sure to ignore commas if they are within quotes (and therefore part of a name).
from email.utils import getaddresses
addrstring = ',friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>,'
def addrparser(addrstring):
addrlist = ['']
quoted = False
# ignore comma at beginning or end
addrstring = addrstring.strip(',')
for char in addrstring:
if char == '"':
# toggle quoted mode
quoted = not quoted
addrlist[-1] += char
# a comma outside of quotes means a new address
elif char == ',' and not quoted:
addrlist.append('')
# anything else is the next letter of the current address
else:
addrlist[-1] += char
return getaddresses(addrlist)
print addrparser(addrstring)
Gives:
[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'),
('Smith, Jane', 'jane.smith@uconn.edu')]
I'd be interested to see how other people would go about this problem!
Convert multiple E-mail string in to dictionary (Multiple E-Mail with name in to one string).
emailstring = 'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>'
Split string by Comma
email_list = emailstring.split(',')
name is key and email is value and make dictionary.
email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list))
Result like this:
{'John Smith': 'john.smith@email.com', 'Friends': 'friend@email.com', 'Smith': 'jane.smith@uconn.edu'}
Note:
If there is same name with different email id then one record is skip.
'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>, Friends <friend_co@email.com>'
"Friends" is duplicate 2 time.
来源:https://stackoverflow.com/questions/5426789/method-for-parsing-text-cc-field-of-email-header