问题
Let's suppose that we have such HTML code. We need to get all <a href=""></a>
tags which DO NOT contain img
tag inside it.
<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>
<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>
I'm using this regular expression to find all the a tag links:
preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>(.*?)</a>!is", $content, $out);
I can modify it like this:
preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>([^<>]+?)</a>!is", $content, $out);
But how can I tell it to exclude results containing <img
substring inside of <a href=""></a>
?
回答1:
Dom is the way to go, but for the sake of interest here is the solution:
The easiest way too exclude certain matches in regular expressions is to use a 'negative look-ahead' or a 'negative look-behind'. If the negative expression is found anywhere in the string, the match fails.
Example:
^(?!.+<img.+)<a href=\"?\'?.+\"?\'?>.+</a>$
Matches:
<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>
But does not match:
<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>
The negative look forward is this part of the string:
(?!.+<img.+)
This says don't match any strings that have any chars followed by <img, followed by any chars.
<a href=\"?\'?.+\"?\'?>.+</a>
The rest is my general match for anchor tags in html, you might want to use an alternate match expression.
You may need to omit the start and end ^ $ chars depending on your useage.
More info on look ahead / behind
http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html
回答2:
You need to use a HTML parser like the Simple DOM parser. You cannot parse HTML with regular expressions.
来源:https://stackoverflow.com/questions/2896088/regular-expression-how-to-find-all-a-tags-which-do-not-contain-tag-img-inside-i