I am looking for a regular expression in PHP which would match the anchor with a specific text on it. E.g I would like to get anchors with text mylink like:
Try a parser instead:
require_once "simple_html_dom.php";
$data = 'Hi, I am looking for a regular expression in PHP which would match the anchor with a
specific text on it. E.g I would like to get anchors with text mylink like:
<a href="blabla" ... >mylink</a>
So it should match all anchors but only if they contain specific text So it should match t
hese string:
<a href="blabla" ... >mylink</a>
<a href="blabla" ... >blabla mylink</a>
<a href="blabla" ... >mylink bla bla</a>
<a href="blabla" ... >bla bla mylink bla bla</a>
but not this one:
<a href="blabla" ... >bla bla bla bla</a> Because this one does not contain word mylink.
Also this one should not match: "mylink is string" because it is not an anchor.
Anybody any Idea? Thanx Granit';
$html = str_get_html($data);
foreach($html->find('a') as $element) {
if(strpos($element->innertext, 'mylink') === false) {
echo 'Ignored: ' . $element->innertext . "\n";
} else {
echo 'Matched: ' . $element->innertext . "\n";
}
}
which produces the output:
Matched: mylink
Matched: mylink
Matched: blabla mylink
Matched: mylink bla bla
Matched: bla bla mylink bla bla
Ignored: bla bla bla bla
Download simple_html_dom.php
from: http://simplehtmldom.sourceforge.net/
This should work (build the regex string and insert whatever string you need instead of "mylink")
<\s*a\s+[^>]*>[^<>]*mylink[^<>]*<\s*\/a\s*>
But this is not recommended. You should use an HTML parser instead and process the tag. Regex is not really the right tool for this. (The above regex will not work if you have links that contain ">" although that might be rare)
I presume php doesnt require any special escape characters if you just use the appropriate wrap around.
Tested at regexpal.com
A few notes::
\s* - To match optional whitespace
\s+ - To match atleast one space/tab and any extra optional whitespace
[^>] - Matches any character except '>'
[^<>]- Matches any character except '<' or '>'
UPDATE: escaped the "/" for php matching with m/regex/
if (preg_match('%<\s*a\s+href="blabla"[^>]*>(.*mylink.*)<\s*/a>%', $text, $regs)) {
$result = $regs[1];
} else {
$result = "";
}
$regs[0]
will hold the complete match
$regs[1]
will hold the bit inside the a tag
/<a[^>]*>([^<]*mylink[^<]*)<\/a>/
it's a bit simplistic, as it will break if tags are inside the link (<a href="/xyz">xyz <i>mylink</i> aaa</a>
), but it should work.