as you know, this is how we validate an email address:
(?:(?:\\r\\n)?[ \\t])*(?:(?:(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t]
)+|\
I think you should forget about this. If you're trying to be better at regular expressions, that is one thing, and there are probably better ways of learning. Otherwise, validating email addresses is an extremely complex and error-prone activity, and I'm not aware of any cut-and-paste solution that will completely cover all the corner cases. Down that path lies madness. If you have an Internet connected application, it's better to validate an address by actually sending a confirmation email.
My policy about email validation is: forget about it. Unicode domains aside, a user can give you a syntactically valid fake address (e.g. test@mailinator.com ).
Instead of validating the email address with that huge (and possibly computationally expensive??) expression
1) send a confirmation link to the user, that way you will know if the email address exist
2) check against Mailinator & Co. domains
can you please explain to me what is going on here?
Get hold of the first (or maybe the second, I know it isn't in the third) edition of "Mastering Regular Expressions". This works through the RFC defining valid email addresses, and builds the regex needed.
If you want to understand what is going on, you should look at a decent module such as Email::Address and note how the pattern is built from its constituent pieces:
my $CTL = q{\x00-\x1F\x7F};
my $special = q{()<>\\[\\]:;@\\\\,."};
my $text = qr/[^\x0A\x0D]/;
my $quoted_pair = qr/\\$text/;
my $ctext = qr/(?>[^()\\]+)/;
my ($ccontent, $comment) = (q{})x2;
for (1 .. $COMMENT_NEST_LEVEL) {
$ccontent = qr/$ctext|$quoted_pair|$comment/;
$comment = qr/\s*\((?:\s*$ccontent)*\s*\)\s*/;
}
my $cfws = qr/$comment|\s+/;
my $atext = qq/[^$CTL$special\\s]/;
my $atom = qr/$cfws*$atext+$cfws*/;
my $dot_atom_text = qr/$atext+(?:\.$atext+)*/;
my $dot_atom = qr/$cfws*$dot_atom_text$cfws*/;
my $qtext = qr/[^\\"]/;
my $qcontent = qr/$qtext|$quoted_pair/;
my $quoted_string = qr/$cfws*"$qcontent+"$cfws*/;
my $word = qr/$atom|$quoted_string/;
etc etc etc.
my $simple_word = qr/$atom|\.|\s*"$qcontent+"\s*/;
my $obs_phrase = qr/$simple_word+/;
my $phrase = qr/$obs_phrase|(?:$word+)/;
my $local_part = qr/$dot_atom|$quoted_string/;
my $dtext = qr/[^\[\]\\]/;
my $dcontent = qr/$dtext|$quoted_pair/;
my $domain_literal = qr/$cfws*\[(?:\s*$dcontent)*\s*\]$cfws*/;
my $domain = qr/$dot_atom|$domain_literal/;
I saw this expression yesterday
/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i
from http://fightingforalostcause.net/misc/2006/compare-email-regex.php