How do I remove all email addresses and links from a string and replace them with \"[removed]\"
Try this:
$patterns = array('<[\w.]+@[\w.]+>', '<\w{3,6}:(?:(?://)|(?:\\\\))[^\s]+>');
$matches = array('[email removed]', '[link removed]');
$newString = preg_replace($patterns, $matches, $stringToBeMatched);
Note: you can pass an array of patterns and matches into preg_replace instead of running it twice.
You can use preg_replace to do it.
for emails:
$pattern = "/[^@\s]*@[^@\s]*\.[^@\s]*/";
$replacement = "[removed]";
preg_replace($pattern, $replacement, $string);
for urls:
$pattern = "/[a-zA-Z]*[:\/\/]*[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-_]+/i";
$replacement = "[removed]";
preg_replace($pattern, $replacement, $string);
Resources
PHP manual entry: http://php.net/manual/en/function.preg-replace.php
Credit where credit is due: email regex taken from preg_match manpage, and URL regex taken from: http://www.weberdev.com/get_example-4227.html
There are a lot of characters valid in the first local part of the email (see What characters are allowed in an email address?), so these lines would replace all valid email addresses:
<?php
$c='a-zA-Z-_0-9'; // allowed characters in domainpart
$la=preg_quote('!#$%&\'*+-/=?^_`{|}~', "/"); // additional allowed in first localpart
$email="[$c$la][$c$la\.]*[^.]@[$c]+\.[$c]+";
$t = preg_replace("/\b($email)\b/", '[removed]', $t);
// or with a link:
$t = preg_replace("/\b($email)\b/", '<a href="mailto:\1">\1</a>', $t);
# replace urls:
a='A-Za-z0-9\-_';
$t = preg_replace("/[htpsftp]+[:\/\/]+[$a]+\.+[$a\.\/%&=\?]+/i", '[removed]', $t);
This will cover most valid email addresses, be informed: removing really only all valid email addresses is a bit more complex (see How to validate an email address using a regular expression?)
My answer is a variation of Josiah's /[^@\s]*@[^@\s]*\.[^@\s]*/
for emails, which works fine but also matches any puctuation after the email address itself: demo 1
Adapt the regex as follows /[^@\s]*@[^@\s\.]*\.[^@\s\.,!?]*/
to exclude .
,
!
and ?
: demo 2
The answer I was going to upvote was deleted. It linked to a Linux Journal article Validate an E-Mail Address with PHP, the Right Way that points out what's wrong with almost every email regex anyone proposes.
The range of valid forms of an email address is much broader than most people think.