PHP DOMDocument how to get element?

前端 未结 2 502
醉梦人生
醉梦人生 2021-01-25 02:48

I am trying to read a website\'s content but i have a problem i want to get images, links these elements but i want to get elements them selves not the element content for insta

相关标签:
2条回答
  • 2021-01-25 03:13

    I'm assuming you just copy-pasted some example code and didn't bother trying to learn how it actually works...

    Anyway, the ->nodeValue part takes the element and returns the text content (because the element has a single text node child - if it had anything else, I don't know what nodeValue would give).

    So, just remove the ->nodeValue and you have your element.

    0 讨论(0)
  • 2021-01-25 03:14

    You appear to be asking for the serialized html of a DOMElement? E.g. you want a string containing <a href="http://example.org">link text</a>? (Please make your question clearer.)

    $url = 'http://example.com';
    $dom = new DOMDocument();
    $dom->loadHTMLFile($url);
    
    $anchors = $dom->getElementsByTagName('a');
    
    foreach ($anchors as $a) {
        // Best solution, but only works with PHP >= 5.3.6
        $htmlstring = $dom->saveHTML($a);
    
        // Otherwise you need to serialize to XML and then fix the self-closing elements
        $htmlstring = saveHTMLFragment($a);
        echo $htmlstring, "\n";
    }
    
    
    function saveHTMLFragment(DOMElement $e) {
        $selfclosingelements = array('></area>', '></base>', '></basefont>',
            '></br>', '></col>', '></frame>', '></hr>', '></img>', '></input>',
            '></isindex>', '></link>', '></meta>', '></param>', '></source>',
        );
        // This is not 100% reliable because it may output namespace declarations.
        // But otherwise it is extra-paranoid to work down to at least PHP 5.1
        $html = $e->ownerDocument->saveXML($e, LIBXML_NOEMPTYTAG);
        // in case any empty elements are expanded, collapse them again:
        $html = str_ireplace($selfclosingelements, '>', $html);
        return $html;
    }
    

    However, note that what you are doing is dangerous because it could potentially mix encodings. It is better to have your output as another DOMDocument and use importNode() to copy the nodes you want. Alternatively, use an XSL stylesheet.

    0 讨论(0)
提交回复
热议问题