I\'m trying to find a reliable solution to extract a url from a string of characters. I have a site where users answer questions and in the source box, where they enter thei
John Gruber has spent a fair amount of time perfecting the "one regex to rule them all" for link detection. Using preg_replace()
as mentioned in the other answers, using the following regex should be one of the most accurate, if not the most accurate, method for detecting a link:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
If you only wanted to match HTTP/HTTPS:
(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
Yahoo! Answers does a fairly good job of link identification when the link is written properly and separate from other text, but it isn't very good at separating trailing punctuation. For example The links are http://example.com/somepage.php, http://example.com/somepage2.php, and http://example.com/somepage3.php.
will include commas on the first two and a period on the third.
But if that is acceptable, then patterns like this should do it:
\<http:[^ ]+\>
It looks like stackoverflow's parser is better. Is is open source?
This code is worked for me.
function makeLink($string){
/*** make sure there is an http:// on all URLs ***/
$string = preg_replace("/([^\w\/])(www\.[a-z0-9\-]+\.[a-z0-9\-]+)/i", "$1http://$2",$string);
/*** make all URLs links ***/
$string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">$1</a>",$string);
/*** make all emails hot links ***/
$string = preg_replace("/([\w-?&;#~=\.\/]+\@(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i","<a href=\"mailto:$1\">$1</a>",$string);
return $string;
}
$string = preg_replace('/https?:\/\/[^\s"<>]+/', '<a href="$0" target="_blank">$0</a>', $string);
It only matches http/https, but that's really the only protocol you want to turn into a link. If you want others, you can change it like this:
$string = preg_replace('/(https?|ssh|ftp):\/\/[^\s"]+/', '<a href="$0" target="_blank">$0</a>', $string);
There are a lot of edge cases with urls. Like url could contain brackets or not contain protocol etc. Thats why regex is not enough.
I created a PHP library that could deal with lots of edge cases: Url highlight.
You could extract urls from string or directly highlight them.
Example:
<?php
use VStelmakh\UrlHighlight\UrlHighlight;
$urlHighlight = new UrlHighlight();
// Extract urls
$urlHighlight->getUrls("This is example http://example.com.");
// return: ['http://example.com']
// Make urls as hyperlinks
$urlHighlight->highlightUrls('Hello, http://example.com.');
// return: 'Hello, <a href="http://example.com">http://example.com</a>.'
For more details see readme. For covered url cases see test.