How can I check for duplicate email addresses in PHP, with the possibility of Gmail\'s automated labeler and punctuation in mind?
For example, I wan
$email_parts = explode('@', $email);
// check if there is a "+" and return the string before
$before_plus = strstr($email_parts[0], '+', TRUE);
$before_at = $before_plus ? $before_plus : $email_parts[0];
// remove "."
$before_at = str_replace('.', '', $before_at);
$email_clean = $before_at.'@'.$email_parts[1];
Strip the address to the basic form before comparing. Make a function normalise()
that will strip the label, then remove all dots. Then you can compare the addresses via:
normalise(address1) == normalise(address2)
If you have to do it very often, save the addresses in the normalised form too, so you don't have to convert them back too often.
Email address parsing is really, really hard to do correctly, without breaking things and annoying users..
First, I would question if you really need to do this? Why do you have multiple email addresses, with different sub-addresses?
If you are sure you need to do this, first read rfc0822, then modify this email address parsing regex to extract all parts of the email, and recombine them excluding the label..
Slightly more.. practically, the Email Address wikipedia page has a section on this part of the address format, Sub-addressing.
The code powtac posted looks like it should work - as long as you're not using it in an automated manner to delete accounts or anything, it should be fine.
Note that the "automated labeler" isn't a GMail specific feature, Gmail simply popularised it.. Other mail servers support this feature, some using +
as the separator, others using -
. If you are going to special-case spaces in GMail addresses, remember to consider the googlemail.com
domain also
function normalize($input) {
$input = str_replace('.', '', $input);
$pattern = '/\+(\w+)@/';
return preg_replace($pattern, '@', $input);
}
Perhaps this would be better titled "How to normalize gmail addresses in PHP, considering (user.name+label@gmail.com)"
You have two technical solutions above. I'll go a different route and ask why you're trying to do this. It doesn't feel right to me. Are you trying to prevent someone registering multiple times at your site using different e-mail addresses? This will only prevent a specialized case of that.
I have my own domain, example.com, and any e-mail that goes to any address at that domain goes to my single mailbox. Do you, now, want to put a check to normalize anything at my example.com to a single address on your end?
By the official e-mail address format, those addresses you are trying to match as the same are different.
This answer is an improvement on @powtac's answer. I needed this function to defeat multiple signups from same person using gmail.
if ( ! function_exists('normalize_email'))
{
/**
* to normalize emails to a base format, especially for gmail
* @param $email
* @return string
*/
function normalize_email($email) {
// ensure email is lowercase because of pending in_array check, and more...
$email = strtolower($email);
$parts = explode('@', $email);
// normalize gmail addresses
if (in_array($parts[1], ['gmail.com', 'googlemail.com'])) {
// check if there is a "+" and return the string before then remove "."
$before_plus = strstr($parts[0], '+', TRUE);
$before_at = str_replace('.', '', $before_plus ? $before_plus : $parts[0]);
// ensure only @gmail.com addresses are used
$email = $before_at.'@gmail.com';
}
return $email;
}
}