I am aware that regex is not ideal for use with HTML strings and I have looked at the PHP Simple HTML DOM Parser but still believe this is the way to go. All the HTML tags w
Improvisation. It should link only if it is a whole word "Amazon" and not words like AmazonWorld.
$result = preg_replace('%\bAmazon(?![^<]*</a>)\b%i', '<a href="http://www.amazon.com">Amazon</a>', $subject);
Joe, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a general question about how to exclude patterns in regex.)
With all the disclaimers about using regex to parse html, here is a simple way to do it.
Here's our simple regex:
<a.*?</a>(*SKIP)(*F)|amazon
The left side of the alternation matches complete <a... </a>
tags, then deliberately fails. The right side matches amazon
, and we know this is the right amazon
because it was not matched by the expression on the left.
This program shows how to use the regex (see the results at the bottom of the online demo):
<?php
$target = "word1 <a stuff amazon> </a> word2 amazon";
$regex = "~(?i)<a.*?</a>(*SKIP)(*F)|amazon~";
$repl= '<a href="http://www.amazon.com">Amazon</a>';
$new=preg_replace($regex,$repl,$target);
echo htmlentities($new);
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Don't do this. You cannot reliably do this with Regex, no matter how consistent your HTML is.
Something like this should work, however:
<?php
$dom = new DOMDocument;
$dom->load('test.xml');
$x = new DOMXPath($dom);
$nodes = $x->query("//text()[contains(., 'Amazon')][not(ancestor::a)]");
foreach ($nodes as $node) {
while (false !== strpos($node->nodeValue, 'Amazon')) {
$word = $node->splitText(strpos($node->nodeValue, 'Amazon'));
$after = $word->splitText(6);
$link = $dom->createElement('a');
$link->setAttribute('href', 'http://www.amazon.com');
$word->parentNode->replaceChild($link, $word);
$link->appendChild($word);
$node = $after;
}
}
$html = $dom->saveHTML();
echo $html;
It's verbose, but it will actually work.
Use this code:
$p = '~((<a\s)(?(2)[^>]*?>))?(amazon)~smi';
$str = '<a href="http://www.amazon.com">Amazon</a>';
$s = preg_replace($p, "$1My $3 Link", $str);
var_dump($s);
String(50) "<a href="http://www.amazon.com">My Amazon Link</a>"
Using the DOM would certainly be preferable.
However, you might get away with this:
$result = preg_replace('%Amazon(?![^<]*</a>)%i', '<a href="http://www.amazon.com">Amazon</a>', $subject);
It matches Amazon
only if
</a>
tag, <a>
tags.It will therefore change this:
I use Amazon for that.
I use <a href="http://www.amazon.com">Amazon</a> for that.
<a href="http://www.amazon.com">My Amazon Link</a>
It will match the "Amazon" in "My Amazon Link"
into this:
I use <a href="http://www.amazon.com">Amazon</a> for that.
I use <a href="http://www.amazon.com">Amazon</a> for that.
<a href="http://www.amazon.com">My Amazon Link</a>
It will match the "<a href="http://www.amazon.com">Amazon</a>" in "My <a href="http://www.amazon.com">Amazon</a> Link"
Try this here
Amazon(?![^<]*</a>)
This will search for Amazon and the negative lookahead ensures that there is no closing tag behind. And I search there only for not <
so that I will not read a opening tag accidentally.
http://regexr.com