assuming you already have an html document, I limited the recognition of URLs to
- not start with an "
- start with http or www
i came up with a solution like this:
$string = 'lorem ipsum dolor sit <a href=""></a><img src=""> amet';
$rx = '%[^"](?P<link>(?:https?://|www\.)(?:[-_a-z0-9]+\.)+(?:[a-z]{2,4}|museum/?)(?:[-_a-z0-9/]+)?(?:\?[-_a-z0-9+\%=&]+)?(?!</a)(\W|$))%ui';
echo preg_replace_callback($rx, function($matches) {
return '<a href="'.$matches['link'].'">'.$matches['link'].'</a>';
}, $string).PHP_EOL;
the output string is
lorem ipsum<a href=" "> </a>dolor sit <a href=""></a><img src=""> amet<a href=""></a>
The regex should work as intendet, an example string of yours could help