PHP DOMDocument how to get element?

前端未结

关注

 2  503

醉梦人生 2021-01-25 02:48

I am trying to read a website\'s content but i have a problem i want to get images, links these elements but i want to get elements them selves not the element content for insta

2条回答

逝去的感伤 (楼主)

2021-01-25 03:14

You appear to be asking for the serialized html of a DOMElement? E.g. you want a string containing link text? (Please make your question clearer.)

$url = 'http://example.com';
$dom = new DOMDocument();
$dom->loadHTMLFile($url);

$anchors = $dom->getElementsByTagName('a');

foreach ($anchors as $a) {
    // Best solution, but only works with PHP >= 5.3.6
    $htmlstring = $dom->saveHTML($a);

    // Otherwise you need to serialize to XML and then fix the self-closing elements
    $htmlstring = saveHTMLFragment($a);
    echo $htmlstring, "\n";
}


function saveHTMLFragment(DOMElement $e) {
    $selfclosingelements = array('>', '>', '>',
        '>
', '>', '>', '>', '>', '>',
        '>', '>', '>', '>', '>',
    );
    // This is not 100% reliable because it may output namespace declarations.
    // But otherwise it is extra-paranoid to work down to at least PHP 5.1
    $html = $e->ownerDocument->saveXML($e, LIBXML_NOEMPTYTAG);
    // in case any empty elements are expanded, collapse them again:
    $html = str_ireplace($selfclosingelements, '>', $html);
    return $html;
}

However, note that what you are doing is dangerous because it could potentially mix encodings. It is better to have your output as another DOMDocument and use importNode() to copy the nodes you want. Alternatively, use an XSL stylesheet.

0 讨论(0)

查看其它2个回答