问题
I'm trying to use feedparser to retrieve some specific information from feeds, but also retrieve the raw XML of each entry (ie. elements for RSS and for Atom), and I can't see how to do that. Obviously I could parse the XML by hand, but that's not very elegant, would require separate support for RSS and Atom, and I imagine it could fall out of sync with feedparser for ill-formed feeds. Is there a better way?
Thanks!
回答1:
I'm the current developer of feedparser. Currently, one of the ways you can get that information is to monkeypatch feedparser._FeedParserMixin
(or edit a local copy of feedparser.py). The methods you'll want to modify are:
feedparser._FeedParserMixin.unknown_starttag
feedparser._FeedParserMixin.unknown_endtag
At the top of each method you can insert a callback to a routine of your own that will capture the elements and their attributes as they're encountered by feedparser.
来源:https://stackoverflow.com/questions/7945669/retrieving-raw-xml-for-items-with-feedparser