Can someone please explain this java Regex to me?
^[a-z0-9!#$%&\'*+/=?^_`{|}~-]+(?:\\\\.[a-z0-9!#$%&\'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](
^ # Beginning of the line
[a-z0-9!#$%&'*+/=?^_`{|}~-]+ # One or more (+) characters from the
bracket expression, i.e., letters [a-z],
numbers [0-9], !, $, %, et cetera
(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)* # Zero or more (*) of the above
expression, preceded by a dot \\.
@ # Literal @
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+ # A digit or a letter, followed by
optional digits, letters, or dashes,
followed by a a dot
(?:[A-Z]{2}|com|org|net...) # Country code ([A-Z]{2}), or a top level
domain, such as com, org, net.
$ # End of the line
Using a concrete example, john@foo.com
. The first part of the e-mail, john
, will be matched by ^[a-z0-9!#$%&'*+/=?^_{|}~-]+
. The @
will be matched by, well, @
. The domain foo
, as well as the dot, is matched by (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+
. Finally, the TLD com
is matched by the alternation (?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|in|aero|jobs|museum)
.
Validating email addresses is now considered bad practice (stop validating email addresses with regex), especially with such expression as in your question. For example here's a more complete expression.
As for this expression let's break it in parts:
Beginning of the matched string
^
Matches at least one character from the list
[a-z0-9!#$%&'*+/=?^_`{|}~-]+
Non-capturing (see backreference) group which can be repeated 0..n times, that matches a .
and then at least one character from the list.
(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
Just this character
@
Non-capturing group matching one character in this list [a-z0-9]
and then possibly more characters from the following lists. Matched string must start and end with [a-z0-9] and inside it can have [a-z0-9-].
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+
Non-capturing group that matches 2 uppercase letters or one of the words.
(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|in|aero|jobs|museum)
End of the string.
$