问题
I'm using the following pattern to capture links, and turn them into HTML friendly links. I use the following pattern in a preg_replace_callback and for the most part it works.
"#(https?|ftp)://(\S+[^\s.,>)\];'\"!?])#"
But this pattern fails when the text reads like so:
http://mylink.com/page[/b]
At that point it captures the [/b amusing it is part of the link, resulting in this:
<a href="http://woodmill.co.uk[/b">woodmill.co.uk[/b</a>]
I've look over the pattern, and used some cheat sheets to try and follow what is happening, but it has foxed me. Can any of you code ninja's help?
回答1:
Try adding the open square bracket to your character class:
(\S+[^\s.,>)[\];'\"!?])
^
UPDATE
Try this more effective URL regex:
^(https?://)?([\da-z\.-]+)\.([a-z\.]{2,6})([/\w \.-]*)*/?$
(From: http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/)
I have no experience directly with PHP regular expressions, but the above is simple and generic enough that I wouldn't expect any problems. You may want to modify it some to extract just the domain, like you seem to be with your current regex.
回答2:
Ok I solved the problem. Thanks to @Cyborgx37 and @MikeBrant for your help. Here's the solution.
Firstly I replaced my regexp pattern with the one that João Castro used in this question: Making a url regex global
The problem with that pattern is it captured any trailing dots at the end, so in the final section of the pattern I added ^.
making the final part look like so [^\s^.]
. As I read it, do not match a trailing space or dot.
This still caused an issue matching bbcode as I mentioned above, so I used preg_replace_callback() and create_function() to filter it out. The final create_function() looks like this:
create_function('$match','
$match[0] = preg_replace("/\[\/?(.*?)\]/", "", $match[0]);
$match[0] = preg_replace("/\<\/?(.*?)\>/", "", $match[0]);
$m = trim(strtolower($match[0]));
$m = str_replace("http://", "", $m);
$m = str_replace("https://", "", $m);
$m = str_replace("ftp://", "", $m);
$m = str_replace("www.", "", $m);
if (strlen($m) > 25)
{
$m = substr($m, 0, 25) . "...";
}
return "<a href=\"$match[0]\" target=\"_blank\">$m</a>";
'), $string);
Tests so far are looking good, so I'm happy it is now solved.
Thanks again, and I hope this helps someone else :)
来源:https://stackoverflow.com/questions/14410134/preg-replace-callback-pattern-issue