Part of a website I am currently working on contains registration process where users have to provide their email address. Just recently I became aware that non-ascii based
As offered by Mario, playing around a bit, I came up with the following regex to validate non-standard email address:
^([\p{L}\_\.\-\d]+)@([\p{L}\-\.\d]+)((\.(\p{L}){2,63})+)$
It would validate any proper email address with all kind of Unicode letters, with TLD limitations from 2 to 63 characters.
Please check it and let me know if there are any flaws.
Example Online
Since 5.2 PHP has a build in validation for email addresses. But I'm not sure if it works for UFT-8 encoded strings:
echo filter_var($email, FILTER_VALIDATE_EMAIL);
In the original PHP source code you will find the reg exp for validating email, this can be used for manually validating when using PHP < 5.2.
Update
idn_to_ascii() can be used to "Convert domain name to IDNA ASCII form." Which then can be validated with filter_var($email, FILTER_VALIDATE_EMAIL);
// International domains
if (function_exists('idn_to_ascii') && strpos($email, '@') !== false) {
$parts = explode('@', $email);
$email = $parts[0].'@'.idn_to_ascii($parts[1]);
}
$is_valid = filter_var($email, FILTER_VALIDATE_EMAIL);
a reg exp could be something like this:
[^ ]+@[^ ]+\.[^ ]{2,6}
what is about something this:
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
mb_ereg('[\w]+@[\w]+\.com',$mail,'UTF-8');
Attempting to validate email addresses may not be a good idea. The specifications (RFC5321, RFC5322) allow for so much flexibility that validating them with regular expressions is literally impossible, and validating with a function is a great deal of work. The result of this is that most email validation schemes end up rejecting a large number of valid email addresses, much to the inconvenience of the users. (By far the most common example of this is not allowing the +
character.)
It is more likely that the user will (accidentally or deliberately) enter an incorrect email address than in an invalid one, so actually validating is a great deal of work for very little benefit, with possible costs if you do it incorrectly.
I would recommend that you just check for the presence of an @
character on the client and then send a confirmation email to verify it; it's the most practical way to validate and it confirms that the address is correct as well.
Got this idea from Javascript tutorial page. It is basic but it works for me without worrying about complexity of regular expressions and unicode standards.
Client side validation
if(!$.trim(value).length) {
return false;
}
else {
AtPos = value.indexOf("@");
StopPos = value.lastIndexOf(".");
if (AtPos == -1 || StopPos == -1) {
return false;
}
if (StopPos < AtPos) {
return false;
}
if (StopPos - AtPos == 1) {
return false;
}
return true;
}
Serverside validation
if(!isset($_POST['emailaddr']) || trim($_POST['emailaddr']) == "") {
//Error: Email required
}
else {
$atpos = strpos($_POST['emailaddr'],'@');
$stoppos = strpos($_POST['emailaddr'],'.');
if(($atpos === false) || ($stoppos === false)) {
//Error: invalid email
}
else {
if($stoppos < $atpos) {
//Error: invalid email
}
else {
if (($stoppos-$atpos) == 1) {
//Error: invalid email
}
}
}
Though it still has some loop holes, I guess users will not be fooling around with this stuff. Also real validation is requierd for serious stuff as suggested by 'Jeremy Banks'.
Hope this will be helpful for somebody else too.
Thanks and regards to all