I\'ve seen similar questions asked before, but none with a working solution.
I am trying to replace all urls on a page with anchor tags, but only those which aren\'t alr
If you are doing it on the client-side it might be worth doing it by walking document tree
Look through text nodes (nodeName="#text") and if there is substring starting with http/https and parent tag is not A - replace it with pattern (<a href="\1">\1</a>
etc)
consider this to start
// getting all tags where there is a text with 'http' which are not links
var textTags = [].slice.call(document.getElementsByTagName('*'))
.filter(function(n) {
return !n.children.length
&& n.nodeName !='A' && n.nodeName !='INPUT'
&& (n.innerHTML.indexOf('http') > -1) })
for(var i in textTags) {
// your code to replace links with whatever you want
}
I think you need to do a two-pass operation. Split the source into
PART1 <a href=...>blah></a> PART2 <a href=...>blah</a> PART3...
Then replace urls with <a href="url">
in each of PART1, PART2 etc, then paste it all back together.
Doing it within a single regex is going to be a headache, if not impossible, depending on your dialect.
For jobs like this, I normally recommend people do it with code rather than regex because regex gets really messy, really fast. However, if you do want a regex, here is a workable solution. Please go to the link to get a full understanding and view of test cases I used.
http://regex101.com/r/kL3iL7
(?:http([s]?):\/\/)?((\w+[.])+\w+(\/\w*)*(\?[^\s]*)*)(?![^\s]*>)
with replacement
<a href="http\1://\2">\2</a>
I do not promise that is is perfect, but it does handle a lot of cases. Let me know if there are any test cases it needs to be fixed for.