Regex ignore URL already in HTML tags

南楼画角 提交于 2019-12-17 14:52:41

问题


I'm having a little problem with my Regex

I've made a custom BBcode for my website, however I also want URLs to be parsed too.

I'm using preg_replace and this is the pattern used to identify URLS:

/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/is

Which works great, however if a URL is within a [img][/img] block, the above pattern also picks it up and produces a result like this:

//[img]http://url.com/toimg.jeg[/img] will produce this result:
<img src="<a href="http://url.com/toimg.jeg" target="_blank">/>
//When it should produce:
<img src="http://url.com/toimg.jeg"/>

I tried using this:

/([^"][\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/][^"])/is

With no luck.

Any help will be appreciated.

Edit: For solution See the 2nd comment on stema's answer.


回答1:


Try this

(?<!href=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])

See it here on Regexr

To make it more general you can simplify your lookbehind to check only for "=""

(?<!=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])

See it on Regexr

(?<!href=") is a negative lookbehind assertion, it ensures that there is no "href="" before your pattern.

\b is a word boundary that anchors the start of your link to a change from a non word to a word character. without this the lookbehind would be useless and it would match from the "ttp://..." on.



来源:https://stackoverflow.com/questions/9567836/regex-ignore-url-already-in-html-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!