Trying to find the links on a page.
my regex is:
/]*href=(\\\"\\\'??)([^\\\"\\\' >]*?)[^>]*>(.*)<\\/a>/
For the one who still not get the solutions very easy and fast using SimpleXML
$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com
Its working for me
Reliable Regex for HTML are difficult. Here is how to do it with DOM:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
}
The above would find and output the "outerHTML" of all A
elements in the $html
string.
To get all the text values of the node, you do
echo $node->nodeValue;
To check if the href
attribute exists you can do
echo $node->hasAttribute( 'href' );
To get the href
attribute you'd do
echo $node->getAttribute( 'href' );
To change the href
attribute you'd do
$node->setAttribute('href', 'something else');
To remove the href
attribute you'd do
$node->removeAttribute('href');
You can also query for the href
attribute directly with XPath
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a/@href');
foreach($nodes as $href) {
echo $href->nodeValue; // echo current attribute value
$href->nodeValue = 'new value'; // set new attribute value
$href->parentNode->removeAttribute('href'); // remove attribute
}
Also see:
On a sidenote: I am sure this is a duplicate and you can find the answer somewhere in here
The following is working for me and returns both href
and value
of the anchor tag.
preg_match_all("'\<a.*?href=\"(.*?)\".*?\>(.*?)\<\/a\>'si", $html, $match);
if($match) {
foreach($match[0] as $k => $e) {
$urls[] = array(
'anchor' => $e,
'href' => $match[1][$k],
'value' => $match[2][$k]
);
}
}
The multidimensional array called $urls
contains now associative sub-arrays that are easy to use.
preg_match_all("/(]>)(.?)(</a)/", $contents, $impmatches, PREG_SET_ORDER);
It is tested and it fetch all a tag from any html code.