I am using the function below to match URLs inside a given text and replace them for HTML links. The regular expression is working great, but currently I am only replacing t
Keep it simple! Say what you cannot have, rather than what you can have :)
As mentioned above, URLs can be quite complex, especially after the '?', and not all of them start with a 'www.' e.g. maps.bing.com/something?key=!"£$%^*()&lat=65&lon&lon=20
So, rather than have a complex regex that wont meet all edge cases, and will be hard to maintain, how about this much simpler one, which works well for me in practise.
Match
http(s):// (anything but a space)+
www. (anything but a space)+
Where 'anything' is [^'"<>\s]
... basically a greedy match, carrying on to you meet a space, quote, angle bracket, or end of line
Also:
Remember to check that it is not already in URL format, e.g. the text contains href="..."
or src="..."
Add ref=nofollow (if appropriate)
This solution isn't as "good" as the libraries mentioned above, but is much simpler, and works well in practise.
if html.match( /(href)|(src)/i )) {
return html; // text already has a hyper link in it
}
html = html.replace(
/\b(https?:\/\/[^\s\(\)\'\"\<\>]+)/ig,
"$1"
);
html = html.replace(
/\s(www\.[^\s\(\)\'\"\<\>]+)/ig,
"$1"
);
html = html.replace(
/^(www\.[^\s\(\)\'\"\<\>]+)/ig,
"$1"
);
return html;