Retrieving raw XML for items with feedparser

為{幸葍}努か 提交于 2019-12-10 11:13:53

问题


I'm trying to use feedparser to retrieve some specific information from feeds, but also retrieve the raw XML of each entry (ie. elements for RSS and for Atom), and I can't see how to do that. Obviously I could parse the XML by hand, but that's not very elegant, would require separate support for RSS and Atom, and I imagine it could fall out of sync with feedparser for ill-formed feeds. Is there a better way?

Thanks!


回答1:


I'm the current developer of feedparser. Currently, one of the ways you can get that information is to monkeypatch feedparser._FeedParserMixin (or edit a local copy of feedparser.py). The methods you'll want to modify are:

  • feedparser._FeedParserMixin.unknown_starttag
  • feedparser._FeedParserMixin.unknown_endtag

At the top of each method you can insert a callback to a routine of your own that will capture the elements and their attributes as they're encountered by feedparser.



来源:https://stackoverflow.com/questions/7945669/retrieving-raw-xml-for-items-with-feedparser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!