I\'m not asking about full email validation.
I just want to know what are allowed characters in user-name
and server
parts of email address
Watch out! There is a bunch of knowledge rot in this thread (stuff that used to be true and now isn't).
To avoid false-positive rejections of actual email addresses in the current and future world, and from anywhere in the world, you need to know at least the high-level concept of RFC 3490, "Internationalizing Domain Names in Applications (IDNA)". I know folks in US and A often aren't up on this, but it's already in widespread and rapidly increasing use around the world (mainly the non-English dominated parts).
The gist is that you can now use addresses like mason@日本.com and wildwezyr@fahrvergnügen.net. No, this isn't yet compatible with everything out there (as many have lamented above, even simple qmail-style +ident addresses are often wrongly rejected). But there is an RFC, there's a spec, it's now backed by the IETF and ICANN, and--more importantly--there's a large and growing number of implementations supporting this improvement that are currently in service.
I didn't know much about this development myself until I moved back to Japan and started seeing email addresses like hei@やる.ca and Amazon URLs like this:
http://www.amazon.co.jp/エレクトロニクス-デジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie=UTF8&node=3210981
I know you don't want links to specs, but if you rely solely on the outdated knowledge of hackers on Internet forums, your email validator will end up rejecting email addresses that non-English-speaking users increasingly expect to work. For those users, such validation will be just as annoying as the commonplace brain-dead form that we all hate, the one that can't handle a + or a three-part domain name or whatever.
So I'm not saying it's not a hassle, but the full list of characters "allowed under some/any/none conditions" is (nearly) all characters in all languages. If you want to "accept all valid email addresses (and many invalid too)" then you have to take IDN into account, which basically makes a character-based approach useless (sorry), unless you first convert the internationalized email addresses (dead since September 2015, used to be like this—a working alternative is here) to Punycode.
After doing that you can follow (most of) the advice above.
In my PHP I use this check
<?php
if (preg_match(
'/^(?:[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+\.)*[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+@(?:(?:(?:[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!\.)){0,61}[a-zA-Z0-9_-]?\.)+[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!$)){0,61}[a-zA-Z0-9_]?)|(?:\[(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\]))$/',
"tim'qqq@gmail.com"
)){
echo "legit email";
} else {
echo "NOT legit email";
}
?>
try it yourself http://phpfiddle.org/main/code/9av6-d10r
For simplicity's sake, I sanitize the submission by removing all text within double quotes and those associated surrounding double quotes before validation, putting the kibosh on email address submissions based on what is disallowed. Just because someone can have the John.."The*$hizzle*Bizzle"..Doe@whatever.com address doesn't mean I have to allow it in my system. We are living in the future where it maybe takes less time to get a free email address than to do a good job wiping your butt. And it isn't as if the email criteria are not plastered right next to the input saying what is and isn't allowed.
I also sanitize what is specifically not allowed by various RFCs after the quoted material is removed. The list of specifically disallowed characters and patterns seems to be a much shorter list to test for.
Disallowed:
local part starts with a period ( .account@host.com )
local part ends with a period ( account.@host.com )
two or more periods in series ( lots..of...dots@host.com )
&’`*|/ ( some&thing`bad@host.com )
more than one @ ( which@one@host.com )
:% ( mo:characters%mo:problems@host.com )
In the example given:
John.."The*$hizzle*Bizzle"..Doe@whatever.com --> John..Doe@whatever.com
John..Doe@whatever.com --> John.Doe@whatever.com
Sending a confirm email message to the leftover result upon an attempt to add or change the email address is a good way to see if your code can handle the email address submitted. If the email passes validation after as many rounds of sanitization as needed, then fire off that confirmation. If a request comes back from the confirmation link, then the new email can be moved from the holding||temporary||purgatory status or storage to become a real, bonafide first-class stored email.
A notification of email address change failure or success can be sent to the old email address if you want to be considerate. Unconfirmed account setups might fall out of the system as failed attempts entirely after a reasonable amount of time.
I don't allow stinkhole emails on my system, maybe that is just throwing away money. But, 99.9% of the time people just do the right thing and have an email that doesn't push conformity limits to the brink utilizing edge case compatibility scenarios. Be careful of regex DDoS, this is a place where you can get into trouble. And this is related to the third thing I do, I put a limit on how long I am willing to process any one email. If it needs to slow down my machine to get validated-- it isn't getting past the my incoming data API endpoint logic.
Edit: This answer kept on getting dinged for being "bad", and maybe it deserved it. Maybe it is still bad, maybe not.
Gmail will only allow + sign as special character and in some cases (.) but any other special characters are not allowed at Gmail. RFC's says that you can use special characters but you should avoid sending mail to Gmail with special characters.
Name:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-/=?^_`{|}~.
Server:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.
As can be found in this Wikipedia link
The local-part of the email address may use any of these ASCII characters:
uppercase and lowercase Latin letters
A
toZ
anda
toz
;digits
0
to9
;special characters
!#$%&'*+-/=?^_`{|}~
;dot
.
, provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g.John..Doe@example.com
is not allowed but"John..Doe"@example.com
is allowed);space and
"(),:;<>@[\]
characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);comments are allowed with parentheses at either end of the local-part; e.g.
john.smith(comment)@example.com
and(comment)john.smith@example.com
are both equivalent tojohn.smith@example.com
.In addition to the above ASCII characters, international characters above U+007F, encoded as UTF-8, are permitted by RFC 6531, though mail systems may restrict which characters to use when assigning local-parts.
A quoted string may exist as a dot separated entity within the local-part, or it may exist when the outermost quotes are the outermost characters of the local-part (e.g.,
abc."defghi".xyz@example.com
or"abcdefghixyz"@example.com
are allowed. Conversely,abc"defghi"xyz@example.com
is not; neither isabc\"def\"ghi@example.com
). Quoted strings and characters however, are not commonly used. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".The local-part
postmaster
is treated specially—it is case-insensitive, and should be forwarded to the domain email administrator. Technically all other local-parts are case-sensitive, thereforejsmith@example.com
andJSmith@example.com
specify different mailboxes; however, many organizations treat uppercase and lowercase letters as equivalent.Despite the wide range of special characters which are technically valid; organisations, mail services, mail servers and mail clients in practice often do not accept all of them. For example, Windows Live Hotmail only allows creation of email addresses using alphanumerics, dot (
.
), underscore (_
) and hyphen (-
). Common advice is to avoid using some special characters to avoid the risk of rejected emails.