Trying to find the links on a page.
my regex is:
/]*href=(\\\"\\\'??)([^\\\"\\\' >]*?)[^>]*>(.*)<\\/a>/
I agree with Gordon, you MUST use an HTML parser to parse HTML. But if you really want a regex you can try this one :
/^<a.*?href=(["\'])(.*?)\1.*$/
This matches <a
at the begining of the string, followed by any number of any char (non greedy) .*?
then href=
followed by the link surrounded by either "
or '
$str = '<a title="this" href="that">what?</a>';
preg_match('/^<a.*?href=(["\'])(.*?)\1.*$/', $str, $m);
var_dump($m);
Output:
array(3) {
[0]=>
string(37) "<a title="this" href="that">what?</a>"
[1]=>
string(1) """
[2]=>
string(4) "that"
}
Quick test: <a\s+[^>]*href=(\"\'??)([^\1]+)(?:\1)>(.*)<\/a>
seems to do the trick, with the 1st match being " or ', the second the 'href' value 'that', and the third the 'what?'.
The reason I left the first match of "/' in there is that you can use it to backreference it later for the closing "/' so it's the same.
See live example on: http://www.rubular.com/r/jsKyK2b6do
why don't you just match
"<a.*?href\s*=\s*['"](.*?)['"]"
<?php
$str = '<a title="this" href="that">what?</a>';
$res = array();
preg_match_all("/<a.*?href\s*=\s*['\"](.*?)['\"]/", $str, $res);
var_dump($res);
?>
then
$ php test.php
array(2) {
[0]=>
array(1) {
[0]=>
string(27) "<a title="this" href="that""
}
[1]=>
array(1) {
[0]=>
string(4) "that"
}
}
which works. I've just removed the first capture braces.
Using your regex, I modified it a bit to suit your need.
<a.*?href=("|')(.*?)("|').*?>(.*)<\/a>
I personally suggest you use a HTML Parser
EDIT: Tested
I'm not sure what you're trying to do here, but if you're trying to validate the link then look at PHP's filter_var()
If you really need to use a regular expression then check out this tool, it may help: http://regex.larsolavtorvik.com/
The pattern you want to look for would be the link anchor pattern, like (something):
$regex_pattern = "/<a href=\"(.*)\">(.*)<\/a>/";