preg_replace_callback pattern issue

烂漫一生 提交于 2019-12-11 16:45:00

问题


I'm using the following pattern to capture links, and turn them into HTML friendly links. I use the following pattern in a preg_replace_callback and for the most part it works.

"#(https?|ftp)://(\S+[^\s.,>)\];'\"!?])#"

But this pattern fails when the text reads like so:

http://mylink.com/page[/b]

At that point it captures the [/b amusing it is part of the link, resulting in this:

<a href="http://woodmill.co.uk[/b">woodmill.co.uk[/b</a>]

I've look over the pattern, and used some cheat sheets to try and follow what is happening, but it has foxed me. Can any of you code ninja's help?


回答1:


Try adding the open square bracket to your character class:

(\S+[^\s.,>)[\];'\"!?])
            ^

UPDATE

Try this more effective URL regex:

^(https?://)?([\da-z\.-]+)\.([a-z\.]{2,6})([/\w \.-]*)*/?$

(From: http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/)

I have no experience directly with PHP regular expressions, but the above is simple and generic enough that I wouldn't expect any problems. You may want to modify it some to extract just the domain, like you seem to be with your current regex.




回答2:


Ok I solved the problem. Thanks to @Cyborgx37 and @MikeBrant for your help. Here's the solution.

Firstly I replaced my regexp pattern with the one that João Castro used in this question: Making a url regex global

The problem with that pattern is it captured any trailing dots at the end, so in the final section of the pattern I added ^. making the final part look like so [^\s^.]. As I read it, do not match a trailing space or dot.

This still caused an issue matching bbcode as I mentioned above, so I used preg_replace_callback() and create_function() to filter it out. The final create_function() looks like this:

create_function('$match','
                $match[0] = preg_replace("/\[\/?(.*?)\]/", "", $match[0]);
                $match[0] = preg_replace("/\<\/?(.*?)\>/", "", $match[0]);
                $m = trim(strtolower($match[0]));
                $m = str_replace("http://", "", $m);
                $m = str_replace("https://", "", $m);
                $m = str_replace("ftp://", "", $m);
                $m = str_replace("www.", "", $m);

                if (strlen($m) > 25)
                {
                    $m = substr($m, 0, 25) . "...";
                }

                return "<a href=\"$match[0]\" target=\"_blank\">$m</a>";
'), $string);

Tests so far are looking good, so I'm happy it is now solved.

Thanks again, and I hope this helps someone else :)



来源:https://stackoverflow.com/questions/14410134/preg-replace-callback-pattern-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!