Grabbing the href attribute of an A element

前端 未结 10 2370
悲&欢浪女
悲&欢浪女 2020-11-21 05:06

Trying to find the links on a page.

my regex is:

/]*href=(\\\"\\\'??)([^\\\"\\\' >]*?)[^>]*>(.*)<\\/a>/
相关标签:
10条回答
  • 2020-11-21 06:04

    For the one who still not get the solutions very easy and fast using SimpleXML

    $a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
    echo $a['href']; // will echo www.something.com
    

    Its working for me

    0 讨论(0)
  • 2020-11-21 06:07

    Reliable Regex for HTML are difficult. Here is how to do it with DOM:

    $dom = new DOMDocument;
    $dom->loadHTML($html);
    foreach ($dom->getElementsByTagName('a') as $node) {
        echo $dom->saveHtml($node), PHP_EOL;
    }
    

    The above would find and output the "outerHTML" of all A elements in the $html string.

    To get all the text values of the node, you do

    echo $node->nodeValue; 
    

    To check if the href attribute exists you can do

    echo $node->hasAttribute( 'href' );
    

    To get the href attribute you'd do

    echo $node->getAttribute( 'href' );
    

    To change the href attribute you'd do

    $node->setAttribute('href', 'something else');
    

    To remove the href attribute you'd do

    $node->removeAttribute('href'); 
    

    You can also query for the href attribute directly with XPath

    $dom = new DOMDocument;
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    $nodes = $xpath->query('//a/@href');
    foreach($nodes as $href) {
        echo $href->nodeValue;                       // echo current attribute value
        $href->nodeValue = 'new value';              // set new attribute value
        $href->parentNode->removeAttribute('href');  // remove attribute
    }
    

    Also see:

    • Best methods to parse HTML
    • DOMDocument in php

    On a sidenote: I am sure this is a duplicate and you can find the answer somewhere in here

    0 讨论(0)
  • 2020-11-21 06:07

    The following is working for me and returns both href and value of the anchor tag.

    preg_match_all("'\<a.*?href=\"(.*?)\".*?\>(.*?)\<\/a\>'si", $html, $match);
    if($match) {
        foreach($match[0] as $k => $e) {
            $urls[] = array(
                'anchor'    =>  $e,
                'href'      =>  $match[1][$k],
                'value'     =>  $match[2][$k]
            );
        }
    }
    

    The multidimensional array called $urls contains now associative sub-arrays that are easy to use.

    0 讨论(0)
  • 2020-11-21 06:10

    preg_match_all("/(]>)(.?)(</a)/", $contents, $impmatches, PREG_SET_ORDER);

    It is tested and it fetch all a tag from any html code.

    0 讨论(0)
提交回复
热议问题