问题
I want to find valid email addresses in a text file, and this is my code:
email = re.findall(r'[a-zA-Z\.-]+@[\w\.-]+',line)
But my code obviously does not contain email addresses where there are numbers before @ sign. And my code could not handle email addresses that do not have valid ending. So could anyone help me with these two problems? Thank you!
An example of my problem would be:
my code can find this email: xyz@gmail.com
but it cannot find this one: xyz123@gmail.com
And it cannot filter this email out either: xyz@gmail
回答1:
From the python re docs, \w
matches any alphanumeric character and underscores, equivalent to the set [a-zA-Z0-9_]
. So [\w\.-]
will appropriately match numbers as well as characters.
email = re.findall(r'[\w\.-]+@[\w\.-]+(\.[\w]+)+',line)
This post discusses matching email addresses much more extensively, and there are a couple more pitfalls you run into matching email addresses that your code fails to catch. For example, email addresses cannot be made up entirely of punctuation (...@....
). Additionally, there is often a maximum length on addresses, depending on the email server. Also, many email servers match non-english characters. So depending on your needs you may need a more comprehensive pattern.
回答2:
Try the validate_email
package.
pip install validate_email
Then
from validate_email import validate_email
is_valid = validate_email('example@example.com')
回答3:
^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$
Not mine, but I have used it in apps before.
Source
来源:https://stackoverflow.com/questions/41798539/find-email-using-regular-expression-python