问题
I've been working on some RSS Scrapper that parses data from multiple sources. That said, all this sources have their own implementation of the description of the RSS.
One in particular, uses CDATA tags to write the description on like, for example
<![CDATA[
<p align=justify><font face="verdana, arial, helvetica, sans-serif" size=1>
<font color=#004080></font>
SOME TEXT GOES HERE
</font></p>
]]>
However if I try to get the item description with SimplePie I get this output
<div><p align="justify"></p></div>
I'm using this php script to do all this
foreach($feed->get_Items() as $item)
{
$title = $item->get_title();
$description = $item->get_description();
//some other stuff
}
And now the good part
The title on the feed comes also like this
<title>
<![CDATA[
Nice title
]]>
</title>
And... it works!!!
How can I get the description of the feed? I've tried almost everything!
Thank you!
回答1:
The get_description() and get_content() methods both do sanitation on the raw data, but you can use the get_item_tags() method to extract it untouched, like this:
$desc_tags = ($item->get_item_tags('', 'description')); // empty namespace is RSS2.0
if ($desc_tags) {
print $desc_tags[0]['data'];
}
The only caveat is while the get_content
or get_description
will try to detect the namespace, you will have to provide it to get_item_tags
, you can see the namespace constants here. If you know the feeds format beforehand that should not be a problem, otherwise you might need to do the trial and error that the get_description
do.
来源:https://stackoverflow.com/questions/11581823/cdata-in-simplepie