Regular expression, how to find all A tags which do not contain tag IMG inside it?

余生颓废 提交于 2019-12-13 04:06:17

问题


Let's suppose that we have such HTML code. We need to get all <a href=""></a> tags which DO NOT contain img tag inside it.

<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>
<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>

I'm using this regular expression to find all the a tag links:

preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>(.*?)</a>!is", $content, $out);

I can modify it like this:

preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>([^<>]+?)</a>!is", $content, $out);

But how can I tell it to exclude results containing <img substring inside of <a href=""></a>?


回答1:


Dom is the way to go, but for the sake of interest here is the solution:

The easiest way too exclude certain matches in regular expressions is to use a 'negative look-ahead' or a 'negative look-behind'. If the negative expression is found anywhere in the string, the match fails.

Example:

^(?!.+<img.+)<a href=\"?\'?.+\"?\'?>.+</a>$

Matches:

<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>

But does not match:

<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>

The negative look forward is this part of the string:

(?!.+<img.+)

This says don't match any strings that have any chars followed by <img, followed by any chars.

<a href=\"?\'?.+\"?\'?>.+</a>

The rest is my general match for anchor tags in html, you might want to use an alternate match expression.

You may need to omit the start and end ^ $ chars depending on your useage.

More info on look ahead / behind

http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html




回答2:


You need to use a HTML parser like the Simple DOM parser. You cannot parse HTML with regular expressions.



来源:https://stackoverflow.com/questions/2896088/regular-expression-how-to-find-all-a-tags-which-do-not-contain-tag-img-inside-i

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!