How do I find every word on a page beginning with http:// and wrap tags around it?
Can I use something like regex perhaps?
This is one of those few things that jQuery doesn't directly help you with much. You basically have to walk through the DOM tree and examine the text nodes (nodeType === 3
); if you find a text node containing the target text you want to wrap ("http://.....", whatever rules you want to apply), you then split the text node (using splitText) into three parts (the part before the string, the part that is the string, and the part following the string), then put the a
element around the second of those.
That sounds a bit complicated, but it isn't really all that bad. It's just a recursive descent walker function (for working through the DOM), a regex match to find the things you want to replace, and then a couple of calls to splitText
, createElement, insertBefore, appendChild.
Here's an example that searches for a fixed string; just add your regex matching for "http://":
walk(document.body, "foo");
function walk(node, targetString) {
var child;
switch (node.nodeType) {
case 1: // Element
for (child = node.firstChild;
child;
child = child.nextSibling) {
walk(child, targetString);
}
break;
case 3: // Text node
handleText(node, targetString);
break;
}
}
function handleText(node, targetString) {
var start, targetNode, followingNode, wrapper;
// Does the text contain our target string?
// (This would be a regex test in your http://... case)
start = node.nodeValue.indexOf(targetString);
if (start >= 0) {
// Split at the beginning of the match
targetNode = node.splitText(start);
// Split at the end of the match
followingNode = targetNode.splitText(targetString.length);
// Wrap the target in an element; in this case, we'll
// use a `span` with a class, but you'd use an `a`.
// First we create the wrapper and insert it in front
// of the target text.
wrapper = document.createElement('span');
wrapper.className = "wrapper";
targetNode.parentNode.insertBefore(wrapper, targetNode);
// Now we move the target text inside it
wrapper.appendChild(targetNode);
// Clean up any empty nodes (in case the target text
// was at the beginning or end of a text ndoe)
if (node.nodeValue.length == 0) {
node.parentNode.removeChild(node);
}
if (followingNode.nodeValue.length == 0) {
followingNode.parentNode.removeChild(followingNode);
}
}
}
Live example
Update: The above didn't handle it if there were multiple matches in the same text node (doh!). And oh what the heck, I did a regexp match — you will have to adjust the regexp, and probably do some post-processing on each match, because what's here is too simplistic. But it's a start:
// The regexp should have a capture group that
// will be the href. In our case below, we just
// make it the whole thing, but that's up to you.
// THIS REGEXP IS ALMOST CERTAINLY TOO SIMPLISTIC
// AND WILL NEED ADJUSTING (for instance: what if
// the link appears at the end of a sentence and
// it shouldn't include the ending puncutation?).
walk(document.body, /(http:\/\/[^ ]+)/i);
function walk(node, targetRe) {
var child;
switch (node.nodeType) {
case 1: // Element
for (child = node.firstChild;
child;
child = child.nextSibling) {
walk(child, targetRe);
}
break;
case 3: // Text node
handleText(node, targetRe);
break;
}
}
function handleText(node, targetRe) {
var match, targetNode, followingNode, wrapper;
// Does the text contain our target string?
// (This would be a regex test in your http://... case)
match = targetRe.exec(node.nodeValue);
if (match) {
// Split at the beginning of the match
targetNode = node.splitText(match.index);
// Split at the end of the match.
// match[0] is the full text that was matched.
followingNode = targetNode.splitText(match[0].length);
// Wrap the target in an `a` element.
// First we create the wrapper and insert it in front
// of the target text. We use the first capture group
// as the `href`.
wrapper = document.createElement('a');
wrapper.href = match[1];
targetNode.parentNode.insertBefore(wrapper, targetNode);
// Now we move the target text inside it
wrapper.appendChild(targetNode);
// Clean up any empty nodes (in case the target text
// was at the beginning or end of a text ndoe)
if (node.nodeValue.length == 0) {
node.parentNode.removeChild(node);
}
if (followingNode.nodeValue.length == 0) {
followingNode.parentNode.removeChild(followingNode);
}
// Continue with the next match in the node, if any
match = followingNode
? targetRe.exec(followingNode.nodeValue)
: null;
}
}
Live example
I am not practically but you can try it
$('a([href^="http://"])').each( function(){
//perform your task
})
I disagree heavily that jQuery can be much use in finding a solution here. Granted you have to get down and dirty with some of the textNode element attributes but putting the DOM back together again after you split your matched node can be made a wee bit easier using the jQuery library.
The following code is documented inline to explain the action taken. I've written it as a jQuery plugin in case you just want to take this and move it around elsewhere. This way you can scope which elements you want to convert URLs for or you can simply use the $("body") selector.
(function($) {
$.fn.anchorTextUrls = function() {
// Test a text node's contents for URLs and split and rebuild it with an achor
var testAndTag = function(el) {
// Test for URLs along whitespace and punctuation boundaries (don't look too hard or you will be consumed)
var m = el.nodeValue.match(/(https?:\/\/.*?)[.!?;,]?(\s+|"|$)/);
// If we've found a valid URL, m[1] contains the URL
if (m) {
// Clone the text node to hold the "tail end" of the split node
var tail = $(el).clone()[0];
// Substring the nodeValue attribute of the text nodes based on the match boundaries
el.nodeValue = el.nodeValue.substring(0, el.nodeValue.indexOf(m[1]));
tail.nodeValue = tail.nodeValue.substring(tail.nodeValue.indexOf(m[1]) + m[1].length);
// Rebuild the DOM inserting the new anchor element between the split text nodes
$(el).after(tail).after($("<a></a>").attr("href", m[1]).html(m[1]));
// Recurse on the new tail node to check for more URLs
testAndTag(tail);
}
// Behave like a function
return false;
}
// For each element selected by jQuery
this.each(function() {
// Select all descendant nodes of the element and pick out only text nodes
var textNodes = $(this).add("*", this).contents().filter(function() {
return this.nodeType == 3
});
// Take action on each text node
$.each(textNodes, function(i, el) {
testAndTag(el);
});
});
}
}(jQuery));
$("body").anchorTextUrls(); //Sample call
Please keep in mind that given the way I wrote this to populate the textNodes array, the method will find ALL descendant text nodes, not just immediate children text nodes. If you want it to replace URLs only amongst the text within a specific selector, remove the .add("*", this) call that adds all the descendants of the selected element.
Here's a fiddle example.