Parsing “From” addresses from email text

前端未结

关注

 8  2083

I\'m trying to extract email addresses from plain text transcripts of emails. I\'ve cobbled together a bit of code to find the addresses themselves, but I don\'t know how to mak

相关标签:

8条回答

滥情空心

2021-02-19 04:50

"[stuff]@[stuff][stuff1-4 letters]" is about right, but if you wanted to you could decode the regular expression using a trick I just found out about, here. Do the compile() in an interactive Python session like this:

mailsrch = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}', 128)

It will print out the following:

in 
  category category_word
  literal 45
max_repeat 1 65535 
  in 
    category category_word
    literal 45
    literal 46
literal 64 
in 
  category category_word
  literal 45
max_repeat 1 65535 
  in 
    category category_word
    literal 45
    literal 46
max_repeat 1 4 
  in 
    range (97, 122)
    range (65, 90)

Which, if you can kind of get used to it, shows you exactly how the RE works.

0 讨论(0)

面向向阳花

2021-02-19 04:52
```
mailsrch = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
```
Expression breakdown:

[\w-]: any word character (alphanumeric, plus underscore) or a dash

[\w-.]+: any word character, a dash, or a period/dot, one or more times

@: literal @ symbol

[\w-][\w-.]+: any word char or dash, followed by any word char, dash, or period one or more times.

[a-zA-Z]{1,4}: any alphabetic character 1-4 times.

To make this match only lines starting with From:, and wrapped in < and > symbols:
```
import re

foundemail = []
mailsrch = re.compile(r'^From:\s+.*<([\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4})>', re.I | re.M)
foundemail.extend(mailsrch.findall(open('text.txt').read()))

print foundemail
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2