Avoid Twitter API limitation with Tweepy

匿名 (未验证) 提交于 2019-12-03 02:45:02

问题:

I saw in some question on Stack Exchange that the limitation can be a function of the number of requests per 15 minutes and depends also on the complexity of the algorithm, except that this is not a complex one.

So I use this code:

import tweepy import sqlite3 import time  db = sqlite3.connect('data/MyDB.db')  # Get a cursor object cursor = db.cursor() cursor.execute('''CREATE TABLE IF NOT EXISTS MyTable(id INTEGER PRIMARY KEY, name TEXT, geo TEXT, image TEXT, source TEXT, timestamp TEXT, text TEXT, rt INTEGER)''') db.commit()  consumer_key = "" consumer_secret = "" key = "" secret = ""  auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(key, secret)  api = tweepy.API(auth)  search = "#MyHashtag"  for tweet in tweepy.Cursor(api.search,                            q=search,                            include_entities=True).items():     while True:         try:             cursor.execute('''INSERT INTO MyTable(name, geo, image, source, timestamp, text, rt) VALUES(?,?,?,?,?,?,?)''',(tweet.user.screen_name, str(tweet.geo), tweet.user.profile_image_url, tweet.source, tweet.created_at, tweet.text, tweet.retweet_count))         except tweepy.TweepError:                 time.sleep(60 * 15)                 continue         break db.commit() db.close() 

I always get the Twitter limitation error:

Traceback (most recent call last):   File "stream.py", line 25, in      include_entities=True).items():   File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 153, in next     self.current_page = self.page_iterator.next()   File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 98, in next     data = self.method(max_id = max_id, *self.args, **self.kargs)   File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 200, in _call     return method.execute()   File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 176, in execute     raise TweepError(error_msg, resp) tweepy.error.TweepError: [{'message': 'Rate limit exceeded', 'code': 88}] 

回答1:

The problem is that your try: except: block is in the wrong place. Inserting data into the database will never raise a TweepError - it's iterating over Cursor.items() that will. I would suggest refactoring your code to call the next method of Cursor.items() in an infinite loop. That call should be placed in the try: except: block, as it can raise an error.

Here's (roughly) what the code should look like:

# above omitted for brevity c = tweepy.Cursor(api.search,                        q=search,                        include_entities=True).items() while True:     try:         tweet = c.next()         # Insert into db     except tweepy.TweepError:         time.sleep(60 * 15)         continue     except StopIteration:         break 

This works because when Tweepy raises a TweepError, it hasn't updated any of the cursor data. The next time it makes the request, it will use the same parameters as the request which triggered the rate limit, effectively repeating it until it goes though.



回答2:

For anyone who stumbles upon this on Google, tweepy 3.2+ has additional parameters for the tweepy.api class, in particular:

  • wait_on_rate_limit
  • wait_on_rate_limit_notify

Setting these flags to True will delegate the waiting to the API instance, which is good enough for most simple use cases.



回答3:

If you want to avoid errors and respect the rate limit you can use the following function which takes your api object as an argument. It retrieves the number of remaining requests of the same type as the last request and waits until the rate limit has been reset if desired.

def test_rate_limit(api, wait=True, buffer=.1):     """     Tests whether the rate limit of the last request has been reached.     :param api: The `tweepy` api instance.     :param wait: A flag indicating whether to wait for the rate limit reset                  if the rate limit has been reached.     :param buffer: A buffer time in seconds that is added on to the waiting                    time as an extra safety margin.     :return: True if it is ok to proceed with the next request. False otherwise.     """     #Get the number of remaining requests     remaining = int(api.last_response.getheader('x-rate-limit-remaining'))     #Check if we have reached the limit     if remaining == 0:         limit = int(api.last_response.getheader('x-rate-limit-limit'))         reset = int(api.last_response.getheader('x-rate-limit-reset'))         #Parse the UTC time         reset = datetime.fromtimestamp(reset)         #Let the user know we have reached the rate limit         print "0 of {} requests remaining until {}.".format(limit, reset)          if wait:             #Determine the delay and sleep             delay = (reset - datetime.now()).total_seconds() + buffer             print "Sleeping for {}s...".format(delay)             sleep(delay)             #We have waited for the rate limit reset. OK to proceed.             return True         else:             #We have reached the rate limit. The user needs to handle the rate limit manually.             return False       #We have not reached the rate limit     return True 


回答4:

Just replace

api = tweepy.API(auth) 

with

api = tweepy.API(auth, wait_on_rate_limit=True) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!