问题
I\'m not asking about full email validation.
I just want to know what are allowed characters in user-name
and server
parts of email address. This may be oversimplified, maybe email adresses can take other forms, but I don\'t care. I\'m asking about only this simple form: user-name@server
(e.g. wild.wezyr@best-server-ever.com) and allowed characters in both parts.
回答1:
See RFC 5322: Internet Message Format and, to a lesser extent, RFC 5321: Simple Mail Transfer Protocol.
RFC 822 also covers email addresses, but it deals mostly with its structure:
addr-spec = local-part "@" domain ; global address
local-part = word *("." word) ; uninterpreted
; case-preserved
domain = sub-domain *("." sub-domain)
sub-domain = domain-ref / domain-literal
domain-ref = atom ; symbolic reference
And as usual, Wikipedia has a decent article on email addresses:
The local-part of the email address may use any of these ASCII characters:
- uppercase and lowercase Latin letters
A
toZ
anda
toz
;- digits
0
to9
;- special characters
!#$%&'*+-/=?^_`{|}~
;- dot
.
, provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g.John..Doe@example.com
is not allowed but"John..Doe"@example.com
is allowed);- space and
"(),:;<>@[\]
characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);- comments are allowed with parentheses at either end of the local-part; e.g.
john.smith(comment)@example.com
and(comment)john.smith@example.com
are both equivalent tojohn.smith@example.com
.
In addition to ASCII characters, as of 2012 you can use international characters above U+007F
, encoded as UTF-8 as described in the RFC 6532 spec and explained on Wikipedia. Note that as of 2019, these standards are still marked as Proposed, but are being rolled out slowly. The changes in this spec essentially added international characters as valid alphanumeric characters (atext) without affecting the rules on allowed & restricted special characters like !#
and @:
.
For validation, see Using a regular expression to validate an email address.
The domain
part is defined as follows:
The Internet standards (Request for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters
a
throughz
(in a case-insensitive manner), the digits0
through9
, and the hyphen (-
). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or blank spaces are permitted.
回答2:
Watch out! There is a bunch of knowledge rot in this thread (stuff that used to be true and now isn't).
To avoid false-positive rejections of actual email addresses in the current and future world, and from anywhere in the world, you need to know at least the high-level concept of RFC 3490, "Internationalizing Domain Names in Applications (IDNA)". I know folks in US and A often aren't up on this, but it's already in widespread and rapidly increasing use around the world (mainly the non-English dominated parts).
The gist is that you can now use addresses like mason@日本.com and wildwezyr@fahrvergnügen.net. No, this isn't yet compatible with everything out there (as many have lamented above, even simple qmail-style +ident addresses are often wrongly rejected). But there is an RFC, there's a spec, it's now backed by the IETF and ICANN, and--more importantly--there's a large and growing number of implementations supporting this improvement that are currently in service.
I didn't know much about this development myself until I moved back to Japan and started seeing email addresses like hei@やる.ca and Amazon URLs like this:
http://www.amazon.co.jp/エレクトロニクス-デジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie=UTF8&node=3210981
I know you don't want links to specs, but if you rely solely on the outdated knowledge of hackers on Internet forums, your email validator will end up rejecting email addresses that non-English-speaking users increasingly expect to work. For those users, such validation will be just as annoying as the commonplace brain-dead form that we all hate, the one that can't handle a + or a three-part domain name or whatever.
So I'm not saying it's not a hassle, but the full list of characters "allowed under some/any/none conditions" is (nearly) all characters in all languages. If you want to "accept all valid email addresses (and many invalid too)" then you have to take IDN into account, which basically makes a character-based approach useless (sorry), unless you first convert the internationalized email addresses to Punycode.
After doing that you can follow (most of) the advice above.
回答3:
The format of e-mail address is: local-part@domain-part
(max. 64@255 characters, no more 256 in total).
The local-part
and domain-part
could have different set of permitted characters, but that's not all, as there are more rules to it.
In general, the local part can have these ASCII characters:
- lowercase Latin letters:
abcdefghijklmnopqrstuvwxyz
, - uppercase Latin letters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
, - digits:
0123456789
, - special characters:
!#$%&'*+-/=?^_`{|}~
, - dot:
.
(not first or last character or repeated unless quoted), - space punctuations such as:
"(),:;<>@[\]
(with some restrictions), - comments:
()
(are allowed within parentheses, e.g.(comment)john.smith@example.com
).
Domain part:
- lowercase Latin letters:
abcdefghijklmnopqrstuvwxyz
, - uppercase Latin letters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
, - digits:
0123456789
, - hyphen:
-
(not first or last character), - can contain IP address surrounded by square brackets:
jsmith@[192.168.2.1]
orjsmith@[IPv6:2001:db8::1]
.
These e-mail addresses are valid:
prettyandsimple@example.com
very.common@example.com
disposable.style.email.with+symbol@example.com
other.email-with-dash@example.com
x@example.com
(one-letter local part)"much.more unusual"@example.com
"very.unusual.@.unusual.com"@example.com
"very.(),:;<>[]\".VERY.\"very@\ \"very\".unusual"@strange.example.com
example-indeed@strange-example.com
admin@mailserver1
(local domain name with no top-level domain)#!$%&'*+-/=?^_`{}|~@example.org
"()<>[]:,;@\\"!#$%&'-/=?^_`{}| ~.a"@example.org
" "@example.org
(space between the quotes)example@localhost
(sent from localhost)example@s.solutions
(see the List of Internet top-level domains)user@com
user@localserver
user@[IPv6:2001:db8::1]
And these examples of invalid:
Abc.example.com
(no@
character)A@b@c@example.com
(only one@
is allowed outside quotation marks)a"b(c)d,e:f;gi[j\k]l@example.com
(none of the special characters in this local part are allowed outside quotation marks)just"not"right@example.com
(quoted strings must be dot separated or the only element making up the local part)this is"not\allowed@example.com
(spaces, quotes, and backslashes may only exist when within quoted strings and preceded by a backslash)this\ still\"not\allowed@example.com
(even if escaped (preceded by a backslash), spaces, quotes, and backslashes must still be contained by quotes)john..doe@example.com
(double dot before@
); (with caveat: Gmail lets this through)john.doe@example..com
(double dot after@
)- a valid address with a leading space
- a valid address with a trailing space
Source: Email address at Wikipedia
Perl's RFC2822 regex for validating emails:
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)
The full regexp for RFC2822 addresses was a mere 3.7k.
See also: RFC 822 Email Address Parser in PHP.
The formal definitions of e-mail addresses are in:
- RFC 5322 (sections 3.2.3 and 3.4.1, obsoletes RFC 2822), RFC 5321, RFC 3696,
- RFC 6531 (permitted characters).
Related:
- The true power of regular expressions
回答4:
Wikipedia has a good article on this, and the official spec is here. From Wikipdia:
The local-part of the e-mail address may use any of these ASCII characters:
- Uppercase and lowercase English letters (a-z, A-Z)
- Digits 0 to 9
- Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
- Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
Additionally, quoted-strings (ie: "John Doe"@example.com) are permitted, thus allowing characters that would otherwise be prohibited, however they do not appear in common practice. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".
回答5:
Google do an interesting thing with their gmail.com addresses. gmail.com addresses allow only letters (a-z), numbers, and periods(which are ignored).
e.g., pikachu@gmail.com is the same as pi.kachu@gmail.com, and both email addresses will be sent to the same mailbox. PIKACHU@gmail.com is also delivered to the same mailbox.
So to answer the question, sometimes it depends on the implementer on how much of the RFC standards they want to follow. Google's gmail.com address style is compatible with the standards. They do it that way to avoid confusion where different people would take similar email addresses e.g.
*** gmail.com accepting rules ***
d.oy.smith@gmail.com (accepted)
d_oy_smith@gmail.com (bounce and account can never be created)
doysmith@gmail.com (accepted)
D.Oy'Smith@gmail.com (bounce and account can never be created)
The wikipedia link is a good reference on what email addresses generally allow. http://en.wikipedia.org/wiki/Email_address
回答6:
You can start from wikipedia article:
- Uppercase and lowercase English letters (a-z, A-Z)
- Digits 0 to 9
- Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
- Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
回答7:
Name:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-/=?^_`{|}~.
Server:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.
回答8:
Check for @ and . and then send an email for them to verify.
I still can't use my .name email address on 20% of the sites on the internet because someone screwed up their email validation, or because it predates the new addresses being valid.
回答9:
The short answer is that there are 2 answers. There is one standard for what you should do. ie behaviour that is wise and will keep you out of trouble. There is another (much broader) standard for the behaviour you should accept without making trouble. This duality works for sending and accepting email but has broad application in life.
For a good guide to the addresses you create; see: http://www.remote.org/jochen/mail/info/chars.html
To filter valid emails, just pass on anything comprehensible enough to see a next step. Or start reading a bunch of RFCs, caution, here be dragons.
回答10:
A good read on the matter.
Excerpt:
These are all valid email addresses!
"Abc\@def"@example.com
"Fred Bloggs"@example.com
"Joe\\Blow"@example.com
"Abc@def"@example.com
customer/department=shipping@example.com
\$A12345@example.com
!def!xyz%abc@example.com
_somename@example.com
回答11:
The accepted answer refers to a Wikipedia article when discussing the valid local-part of an email address, but Wikipedia is not an authority on this.
IETF RFC 3696 is an authority on this matter, and should be consulted at section 3. Restrictions on email addresses on page 5:
Contemporary email addresses consist of a "local part" separated from a "domain part" (a fully-qualified domain name) by an at-sign ("@"). The syntax of the domain part corresponds to that in the previous section. The concerns identified in that section about filtering and lists of names apply to the domain names used in an email context as well. The domain name can also be replaced by an IP address in square brackets, but that form is strongly discouraged except for testing and troubleshooting purposes.
The local part may appear using the quoting conventions described below. The quoted forms are rarely used in practice, but are required for some legitimate purposes. Hence, they should not be rejected in filtering routines but, should instead be passed to the email system for evaluation by the destination host.
The exact rule is that any ASCII character, including control characters, may appear quoted, or in a quoted string. When quoting is needed, the backslash character is used to quote the following character. For example
Abc\@def@example.com
is a valid form of an email address. Blank spaces may also appear, as in
Fred\ Bloggs@example.com
The backslash character may also be used to quote itself, e.g.,
Joe.\\Blow@example.com
In addition to quoting using the backslash character, conventional double-quote characters may be used to surround strings. For example
"Abc@def"@example.com "Fred Bloggs"@example.com
are alternate forms of the first two examples above. These quoted forms are rarely recommended, and are uncommon in practice, but, as discussed above, must be supported by applications that are processing email addresses. In particular, the quoted forms often appear in the context of addresses associated with transitions from other systems and contexts; those transitional requirements do still arise and, since a system that accepts a user-provided email address cannot "know" whether that address is associated with a legacy system, the address forms must be accepted and passed into the email environment.
Without quotes, local-parts may consist of any combination of
alphabetic characters, digits, or any of the special characters! # $ % & ' * + - / = ? ^ _ ` . { | } ~
period (".") may also appear, but may not be used to start or end the local part, nor may two or more consecutive periods appear. Stated differently, any ASCII graphic (printing) character other than the at-sign ("@"), backslash, double quote, comma, or square brackets may appear without quoting. If any of that list of excluded characters are to appear, they must be quoted. Forms such as
user+mailbox@example.com customer/department=shipping@example.com $A12345@example.com !def!xyz%abc@example.com _somename@example.com
are valid and are seen fairly regularly, but any of the characters listed above are permitted.
As others have done, I submit a regex that works for both PHP and JavaScript to validate email addresses:
/^[a-z0-9!'#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!'#$%&*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-zA-Z]{2,}$/i
回答12:
As can be found in this Wikipedia link
The local-part of the email address may use any of these ASCII characters:
uppercase and lowercase Latin letters
A
toZ
anda
toz
;digits
0
to9
;special characters
!#$%&'*+-/=?^_`{|}~
;dot
.
, provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g.John..Doe@example.com
is not allowed but"John..Doe"@example.com
is allowed);space and
"(),:;<>@[\]
characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);comments are allowed with parentheses at either end of the local-part; e.g.
john.smith(comment)@example.com
and(comment)john.smith@example.com
are both equivalent tojohn.smith@example.com
.In addition to the above ASCII characters, international characters above U+007F, encoded as UTF-8, are permitted by RFC 6531, though mail systems may restrict which characters to use when assigning local-parts.
A quoted string may exist as a dot separated entity within the local-part, or it may exist when the outermost quotes are the outermost characters of the local-part (e.g.,
abc."defghi".xyz@example.com
or"abcdefghixyz"@example.com
are allowed. Conversely,abc"defghi"xyz@example.com
is not; neither isabc\"def\"ghi@example.com
). Quoted strings and characters however, are not commonly used. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".The local-part
postmaster
is treated specially—it is case-insensitive, and should be forwarded to the domain email administrator. Technically all other local-parts are case-sensitive, thereforejsmith@example.com
andJSmith@example.com
specify different mailboxes; however, many organizations treat uppercase and lowercase letters as equivalent.Despite the wide range of special characters which are technically valid; organisations, mail services, mail servers and mail clients in practice often do not accept all of them. For example, Windows Live Hotmail only allows creation of email addresses using alphanumerics, dot (
.
), underscore (_
) and hyphen (-
). Common advice is to avoid using some special characters to avoid the risk of rejected emails.
回答13:
The answer is (almost) ALL
(7-bit ASCII).
If the inclusion rules is "...allowed under some/any/none conditions..."
Just by looking at one of several possible inclusion rules for allowed text in the "domain text" part in RFC 5322 at the top of page 17 we find:
dtext = %d33-90 / ; Printable US-ASCII
%d94-126 / ; characters not including
obs-dtext ; "[", "]", or "\"
the only three missing chars in this description are used in domain-literal []
, to form a quoted-pair \
, and the white space character (%d32). With that the whole range 32-126 (decimal) is used. A similar requirement appear as "qtext" and "ctext". Many control characters are also allowed/used. One list of such control chars appears in page 31 section 4.1 of RFC 5322 as obs-NO-WS-CTL.
obs-NO-WS-CTL = %d1-8 / ; US-ASCII control
%d11 / ; characters that do not
%d12 / ; include the carriage
%d14-31 / ; return, line feed, and
%d127 ; white space characters
All this control characters are allowed as stated at the start of section 3.5:
.... MAY be used, the use of US-ASCII control characters (values
1 through 8, 11, 12, and 14 through 31) is discouraged ....
And such an inclusion rule is therefore "just too wide". Or, in other sense, the expected rule is "too simplistic".
回答14:
For simplicity's sake, I sanitize the submission by removing all text within double quotes and those associated surrounding double quotes before validation, putting the kibosh on email address submissions based on what is disallowed. Just because someone can have the John.."The*$hizzle*Bizzle"..Doe@whatever.com address doesn't mean I have to allow it in my system. We are living in the future where it maybe takes less time to get a free email address than to do a good job wiping your butt. And it isn't as if the email criteria are not plastered right next to the input saying what is and isn't allowed.
I also sanitize what is specifically not allowed by various RFCs after the quoted material is removed. The list of specifically disallowed characters and patterns seems to be a much shorter list to test for.
Disallowed:
local part starts with a period ( .account@host.com )
local part ends with a period ( account.@host.com )
two or more periods in series ( lots..of...dots@host.com )
&’`*|/ ( some&thing`bad@host.com )
more than one @ ( which@one@host.com )
:% ( mo:characters%mo:problems@host.com )
In the example given:
John.."The*$hizzle*Bizzle"..Doe@whatever.com --> John..Doe@whatever.com
John..Doe@whatever.com --> John.Doe@whatever.com
Sending a confirm email message to the leftover result upon an attempt to add or change the email address is a good way to see if your code can handle the email address submitted. If the email passes validation after as many rounds of sanitization as needed, then fire off that confirmation. If a request comes back from the confirmation link, then the new email can be moved from the holding||temporary||purgatory status or storage to become a real, bonafide first-class stored email.
A notification of email address change failure or success can be sent to the old email address if you want to be considerate. Unconfirmed account setups might fall out of the system as failed attempts entirely after a reasonable amount of time.
I don't allow stinkhole emails on my system, maybe that is just throwing away money. But, 99.9% of the time people just do the right thing and have an email that doesn't push conformity limits to the brink utilizing edge case compatibility scenarios. Be careful of regex DDoS, this is a place where you can get into trouble. And this is related to the third thing I do, I put a limit on how long I am willing to process any one email. If it needs to slow down my machine to get validated-- it isn't getting past the my incoming data API endpoint logic.
Edit: This answer kept on getting dinged for being "bad", and maybe it deserved it. Maybe it is still bad, maybe not.
回答15:
In my PHP I use this check
<?php
if (preg_match(
'/^(?:[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+\.)*[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+@(?:(?:(?:[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!\.)){0,61}[a-zA-Z0-9_-]?\.)+[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!$)){0,61}[a-zA-Z0-9_]?)|(?:\[(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\]))$/',
"tim'qqq@gmail.com"
)){
echo "legit email";
} else {
echo "NOT legit email";
}
?>
try it yourself http://phpfiddle.org/main/code/9av6-d10r
回答16:
I created this regex according to RFC guidelines:
^[\\w\\.\\!_\\%#\\$\\&\\'=\\?\\*\\+\\-\\/\\^\\`\\{\\|\\}\\~]+@(?:\\w+\\.(?:\\w+\\-?)*)+$
回答17:
Gmail will only allow + sign as special character and in some cases (.) but any other special characters are not allowed at Gmail. RFC's says that you can use special characters but you should avoid sending mail to Gmail with special characters.
来源:https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address