问题
I'm using the following functions:
function MakeLinks($source){
return preg_replace('!(((f|ht){1}tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)!i', '<a href="/1">$1</a>', $source);
}
function simpleWiki($text){
$text = preg_replace('/\[\[Image:(.*)\]\]/', '<a href="$1"><img src="$1" /></a>', $text);
return $text;
}
The first one converts http://example.com
into http://example.com link.
The second function turns strings like [[Image:http://example.com/logo.png]]
into an image.
Now if I have a text
$text = 'this is my image [[Image:http://example.com/logo.png]]';
and convert it like this simpleWiki(makeLinks($text))
it outputs something similar to:
this is my image <a href="url"><img src="<a href="url">url</a>"/></a>
How can I prevent this? How to check that the URL is not part of a [[Image:URL]]
construction?
回答1:
Your immediate problem can be solved by combining the two expressions into one (with two alternatives) and then using the not-so-well-known-but-very-powerful: preg_replace_callback()
function which handles each case separately in one pass through the target string like so:
<?php // test.php 20110312_1200
$data = "[[Image:http://example.com/logo1.png]]\n".
"http://example1.com\n".
"[[Image:http://example.com/logo2.png]]\n".
"http://example2.com\n";
$re = '!# Capture WikiImage URLs in $1 and other URLs in $2.
# Either $1: WikiImage URL
\[\[Image:(.*?)\]\]
| # Or $2: Non-WikiImage URL.
(((f|ht){1}tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)
!ixu';
$data = preg_replace_callback($re, '_my_callback', $data);
// The callback function is called once for each
// match found and is passed one parameter: $matches.
function _my_callback($matches)
{ // Either $1 or $2 matched, but never both.
if ($matches[1]) { // $1: WikiImage URL
return '<a href="'. $matches[1] .
'"><img src="'. $matches[1] .'" /></a>';
}
else { // $2: Non-WikiImage URL.
return '<a href="'. $matches[2] .
'">'. $matches[2] .'</a>';
}
}
echo($data);
?>
This script implements your two regexes and does what you are asking. Note that I did change the greedy (.*)
to the (.*?)
lazy version because the greedy version does not work correctly (it fails to handle multiple WikiImages). I also added the 'u'
modifier to the regex (which is needed when a pattern contains Unicode characters). As you can see, the preg callback function is very powerful. (This technique can be used to do some pretty heavy lifting, text-processing-wise.)
However, please note that the regex you are using to pick out URLs can be significantly improved. Check out the following resources for more information on "Linkifying" URLs (Hint: there are a bunch of "gotchas"):
The Problem With URLs
An Improved Liberal, Accurate Regex Pattern for Matching URLs
URL Linkification (HTTP/FTP)
回答2:
In your MakeLinks
add this [^:"]{1}
, see below:
function MakeLinks($source){
return preg_replace('![^:"]{1}(((f|ht){1}tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)!i', '<a href="/1">$1</a>', $source);
}
Then only the link without ":" before (like in Image:) will be transform.
And use $text = simpleWiki(MakeLinks($text));
.
EDIT : You can change with this: preg_replace('![[:space:]](((f|ht){1}tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)[[:space:]]!i', '<a href="$1">$1</a>', $source);
来源:https://stackoverflow.com/questions/5282745/simple-wiki-parser-and-link-autodetection