Reading RSS Feeds: What Aggregators Do That I'm Not

问题

I drop the following feed into Google Reader, and it update normally.

http://www.indeed.ca/rss?q=&l=Hamilton%2C+ON

However, when I use any of a number of approaches suggested thither and yon on the 'net that simply involve reading from this source and parsing the XML I receive the same 20 items.

What is Google Reader doing that I should be in my code so that I receive new items?

Thanks for your advice. Incidentally, I'm coding in Python.

回答1:

RSS aggregators "poll" the sources, i.e., they repeat the HTTP query periodically on each source, and check if anything new appears in the results. That's unfortunate, as polling always is, as it wastes resources in an unending series of "are we there yet?" questions (kind of like taking a toddler along in a long car drive;-), and nevertheless implies delays (if you poll a given source every hour, say, you'll wait up to an hour to see some results).

Unfortunately, in the RSS architecture itself, there are no alternatives, no way to ask for a "callback" when new stuff appears or opt for a saner "publish-subscribe architecture".

A good effort to remedy that is pubsubhubbub, but it inevitably requires cooperation (above and beyond the RSS standards) from RSS sources and aggregators -- so it needs very wide takeup before it can be called "a solution" to the problem, though, technically, it already is (for cooperating sites;-).

So back to your question, you're doing nothing wrong: you just need to poll periodically, like RSS aggregators do, in order to get to see new results eventually.

回答2:

1) Have you tried with other RSS feeds?

2) If so, it sounds like some kind of cache... Are you behind some proxy?

来源：https://stackoverflow.com/questions/3382942/reading-rss-feeds-what-aggregators-do-that-im-not

标签

python

rss

aggregator