Trying to Parse Only the Images from an RSS Feed

若如初见. 提交于 2019-12-02 04:38:29
IMSoP

The <img> tags inside that RSS feed are not actually elements of the XML document, contrary to the syntax highlighting on this site - they are just text inside the <description> element which happen to contain the characters < and >.

The string <![CDATA[ tells the XML parser that everything from there until it encounters ]]> is to be treated as a raw string, regardless of what it contains. This is useful for embedding HTML inside XML, since the HTML tags wouldn't necessarily be valid XML. It is equivalent to escaping the whole HTML (e.g. with htmlspecialchars) so that the <img> tags would look like &lt;img&gt;. (I went into more technical details on another answer.)

So to extract the images from the RSS requires two steps: first, get the text of each <description>, and second, find all the <img> tags in that text.

$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1&r=ceddfb43483437b1ed08ab8a72cbc3d5');

$descriptions = $xml->xpath('//item/description');
foreach ( $descriptions as $description_node ) {
    // The description may not be valid XML, so use a more forgiving HTML parser mode
    $description_dom = new DOMDocument();
    $description_dom->loadHTML( (string)$description_node );

    // Switch back to SimpleXML for readability
    $description_sxml = simplexml_import_dom( $description_dom );

    // Find all images, and extract their 'src' param
    $imgs = $description_sxml->xpath('//img');
    foreach($imgs as $image) {
        echo (string)$image['src'];
    }
}

I don't have much experience with xPath, but you could try the following:

$imgs = $xml->xpath('item//img');

This will select all img-elements which are inside item-elements, regardless if there are other elements inbetween. Removing the leading slash will search for item anywhere in the documet, not just from the root. Otherwise, you'd need something like /rss/channel/item....

As for displaying the images: Just output <img>-tags followed by line-breaks, like so:

foreach($imgs as $image) {
    echo '<img src="' . $image->src . '" /><br />';
}

The preferred way would be to use CSS instead of <br>-tags, but I think they are simpler for a start.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!