Is it possible to search for and remove URLs from a string in PHP. Talking about the actual text here not the HTML. Example to remove:
mywebsite.com
http://myweb
You could try something that looks for .TLD, where TLD is any existing top-level domain, but that may result in too many false positives.
Would it be possible to implement a system where posts containing questionable content need moderation to be posted, but others are posted right away? I'm assuming it's a firm business requirement to disallow this type of content.
Personally, I would tend to just prevent any hyperlinking, and leave it at that. But, it's not my app.
This regex seems to do the trick:
!\b(((ht|f)tp(s?))\://)?(www.|[a-z].)[a-z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-z0-9\.\,\;\?\\'\\\\\+&%\$#\=~_\-]+))*\b!i
It is a slight modification of this regex from Regular Expression Library.
I realize it’s a bit overwhelming, but that's to be expected when searching for URLs. Nevertheless, it matches everything on your list.
Alternatively, you could loop through each word in the description and use parse_url()
to see how the word breaks down. I’ll leave the criteria for determining if it's a url to you. There’s still the potential for false positives, but they could be greatly reduced. Combined with Andrew’s idea of flagging questionable content for moderation, it could be a workable solution.
You can easily use a regex to find the URLs, then specify what to replace them with using PHP's function preg_replace.
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
Edit: Since this is user submitted data, you might want to do some validation before you store the "description" field, and check to see if it contains a URL. If it does, you can prevent the user from saving the form.
For this, you can use preg_match, while still using a regex to find a URL.