I\'ve heard that it is a bad thing to validate email addresses with a regex, and that it actually can cause harm. Why is that? I thought it never could be a bad thing to val
If your regular expression is ill-formed then you might deny valid email addresses. This goes for any "email validation" rule.
I know of an email address which is regularly denied by forms which doesn't contain any email oddities; it's merely long. It really annoys the person it belongs to because the part before the @
is their legal name - an obvious choice for an email address.
That is part of the potential harm of email validation done incorrectly: annoying users by denying valid email addresses from entering the system.
I've heard that it is a bad thing to validate email addresses with a regex, and that it actually can cause harm. Why is that?
This is correct. The regex solution is attractive, because an email address is a structured string, and regex is used to find structure in strings.
It is also the wrong solution, because when you ask the user for an email address, it is usually so you can contact them.
The validation is incorrect because:
the address may be valid, but not an address the user has access to. I could fill in the address billgates@microsoft.com
to any form, and it will probably be accepted as a valid email address ( disclaimer: I am not Bill Gates :) ).
the syntax for email addresses is very tricky to get correctly (see the examples here) - by defining your own regex for email validation, you will end up rejecting valid addresses, and accepting invalid ones.
I thought it never could be a bad thing to validate data.
It's not bad to validate data. In this case though, you will provide a feature in your application, that is deffective by design:
Your application looks to your developers as if it is validating the input, but the validation is unnecessary, probably incomplete, and at the end of the validation, you don't know if you have an address that will allow you to contact the user.
Maybe unnecessary, but never a bad thing provided that you perform the validation correctly.
It is not unnecessary, it is necessary. It's just that regex is the wrong tool for it.
At the end of the day, the best way to check that the address is valid for the user is unique token exchange for that address: