feedparser

Retrieving raw XML for items with feedparser

落花浮王杯 提交于 2019-12-06 06:17:51
I'm trying to use feedparser to retrieve some specific information from feeds, but also retrieve the raw XML of each entry (ie. elements for RSS and for Atom), and I can't see how to do that. Obviously I could parse the XML by hand, but that's not very elegant, would require separate support for RSS and Atom, and I imagine it could fall out of sync with feedparser for ill-formed feeds. Is there a better way? Thanks! I'm the current developer of feedparser. Currently, one of the ways you can get that information is to monkeypatch feedparser._FeedParserMixin (or edit a local copy of feedparser

Django rss feedparser returns a feed with no “title”

六月ゝ 毕业季﹏ 提交于 2019-12-05 10:51:00
I'm writing a basic RSS feed reader in Django. I have a form in which a user submits a rss feed, and I add it to his feeds list. But for some reason, I'm unable to extract basic information about the feed using feed parser. when i run the following code: def form_valid(self, form): user = self.request.user link = form.cleaned_data['link'] feed = feedparser.parse(link).feed title = feed.title try: feed_obj = Feed.objects.get(link=link) except ObjectDoesNotExist: feed_obj = Feed(link=link, title=title) feed_obj.save() user.get_profile().feeds.add(feed_obj) return super(DashboardView, self).form

feedparser fails during script run, but can't reproduce in interactive python console

醉酒当歌 提交于 2019-12-04 05:03:06
问题 It's failing with this when I run eclipse or when I run my script in iPython: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128) I don't know why, but when I simply execute the feedparse.parse(url) statement using the same url, there is no error thrown. This is stumping me big time. The code is as simple as: try: d = feedparser.parse(url) except Exception, e: logging.error('Error while retrieving feed.') logging.error(e) logging.error(formatExceptionInfo(None))

Parsing different date formats from feedparser in python?

我的未来我决定 提交于 2019-12-03 03:53:43
I'm trying to get the dates from entries in two different RSS feeds through feedparser . Here is what I'm doing: import feedparser as fp reddit = fp.parse("http://www.reddit.com/.rss") cc = fp.parse("http://contentconsumer.com/feed") print reddit.entries[0].date print cc.entries[0].date And here's how they come out: 2008-10-21T22:23:28.033841+00:00 Wed, 15 Oct 2008 10:06:10 +0000 I want to get to the point where I can find out which is newer easily. I've tried using the datetime module of Python and searching through the feedparser documentation, but I can't get past this problem. Any help

RSS feed parser library in Python [closed]

两盒软妹~` 提交于 2019-12-03 03:30:57
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 months ago . I am looking for a good library in python that will help me parse RSS feeds. Has anyone used feedparser? Any feedback? 回答1: Using feedparser is a much better option than rolling your own with minidom or BeautifulSoup. It normalizes the differences between all versions of RSS and Atom so you don't have to have

RSS feed parser library in Python [closed]

血红的双手。 提交于 2019-12-02 17:54:02
I am looking for a good library in python that will help me parse RSS feeds. Has anyone used feedparser? Any feedback? FogleBird Using feedparser is a much better option than rolling your own with minidom or BeautifulSoup. It normalizes the differences between all versions of RSS and Atom so you don't have to have different code for each type. It's good about detecting different date formats and other variations in feeds. It automatically follows HTTP redirects. It sanitizes HTML content. It has support for ETag and Last-Modified headers so you can see if the feed has changed just by

How can I parse multiple URLs in feedparser (Python)?

强颜欢笑 提交于 2019-12-02 12:19:57
问题 I'm making a little webapp with some fixed feeds (fixed as in, you can't add feeds like in Feedly or Google Reader) I tried this, with no luck RSS_URLS = [ 'http://feeds.feedburner.com/RockPaperShotgun', 'http://www.gameinformer.com/b/MainFeed.aspx?Tags=preview', ] feed = feedparser.parse(RSS_URLS) for post in feed.entries: print post.title And this, with no luck RSS_URLS = [ 'http://feeds.feedburner.com/RockPaperShotgun', 'http://www.gameinformer.com/b/MainFeed.aspx?Tags=preview', ] feed = [

Error installing FeedZirra

风流意气都作罢 提交于 2019-12-02 05:29:37
问题 I am new to Ruby on Rails. I am excited about Feed parsing but when I install FeedZirra I am getting this error. I use Windows 7 and Ruby 1.8.7. Please help. Thanks in advance. C:\Ruby187>gem sources -a http://gems.github.com http://gems.github.com added to sources C:\Ruby187>gem install pauldix-feedzirra Building native extensions. This could take a while... ERROR: Error installing pauldix-feedzirra: ERROR: Failed to build gem native extension. C:/Ruby187/bin/ruby.exe extconf.rb checking for

How can I parse multiple URLs in feedparser (Python)?

浪尽此生 提交于 2019-12-02 04:10:01
I'm making a little webapp with some fixed feeds (fixed as in, you can't add feeds like in Feedly or Google Reader) I tried this, with no luck RSS_URLS = [ 'http://feeds.feedburner.com/RockPaperShotgun', 'http://www.gameinformer.com/b/MainFeed.aspx?Tags=preview', ] feed = feedparser.parse(RSS_URLS) for post in feed.entries: print post.title And this, with no luck RSS_URLS = [ 'http://feeds.feedburner.com/RockPaperShotgun', 'http://www.gameinformer.com/b/MainFeed.aspx?Tags=preview', ] feed = [] for url in RSS_URLS: feed.append(feedparser.parse(url)) for post in feed.entries: print post.title

How to detect if a page is an RSS or ATOM feed

老子叫甜甜 提交于 2019-12-01 10:48:34
I'm currently building a new online Feed Reader in PHP. One of the features i'm working on is feed auto-discovery. If a user enters a website URL, the script will detect that its not a feed and look for the real feed URL by parsing the HTML for the proper tag. The problem is, the way im currently detecting if the URL is a feed or a website only works part of the time, and I know it can't be the best solution. Right now im taking the CURL response and running it through simplexml_load_string, if it can't parse it I treat it as a website. Here is the code. $xml = @simplexml_load_string( $site