How to restart tweepy script in case of error?

后端 未结 5 478
甜味超标
甜味超标 2020-12-13 16:37

I have a python script that continuously stores tweets related to tracked keywords to a file. However, the script tends to crash repeatedly due to an error appended below.

相关标签:
5条回答
  • 2020-12-13 16:58

    I had this problem occurring recently and wanted to share more detailed information about it.

    The error that's causing it is because the streaming filter that's chosen is too broad test. Therefore you receive streams at a faster rate than you can accept which causes an IncompleRead error.

    This can be fixed by either refining the search or by using a more specific exception:

    from http.client import IncompleteRead
    ...
    try:
        sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
        sapi.filter(track=["test"])
    except IncompleRead:
        pass
    
    0 讨论(0)
  • 2020-12-13 17:08

    Figured out how to incorporate the while/try loop by writing a new function for the stream:

    def start_stream():
        while True:
            try:
                sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
                sapi.filter(track=["Samsung", "s4", "s5", "note" "3", "HTC", "Sony", "Xperia", "Blackberry", "q5", "q10", "z10", "Nokia", "Lumia", "Nexus", "LG", "Huawei", "Motorola"])
            except: 
                continue
    
    start_stream()
    

    I tested the auto restart by manually interrupting the program with CMD + C. Nonetheless, happy to hear of better ways to test such functionality.

    0 讨论(0)
  • 2020-12-13 17:08

    It's better to use recursive call instead of infinite while loop. Take a look at filter function below. e.g.

    from tweepy import Stream
    from service.twitter.listener.tweety_listener import TweetyStreamDataListener
    from settings import twitter_config
    
    class Tweety(object):
        def __init__(self, listener=TweetyStreamDataListener()):
            self.listener = listener
            self.__auth__ = None
    
        def __authenticate__(self):
            from tweepy import OAuthHandler
            if self.__auth__ is None:
                self.__auth__ = OAuthHandler(twitter_config['consumer_key'], twitter_config['consumer_secret'])
                self.__auth__.set_access_token(twitter_config['access_token'], twitter_config['access_token_secret'])
            return self.__auth__ is not None
    
        def __streamer__(self):
            is_authenticated = self.__authenticate__()
            if is_authenticated:
                return Stream(self.__auth__, self.listener)
            return None
    
        def filter(self, keywords=None, async=True):
            streamer = self.__streamer__()
            try:
                print "[STREAM] Started steam"
                streamer.filter(track=keywords, async=async)
            except Exception as ex:
                print "[STREAM] Stream stopped! Reconnecting to twitter stream"
                print ex.message, ex.args
                self.filter(keywords=keywords, async=async)
    
    0 讨论(0)
  • 2020-12-13 17:12

    One option would be to try the module multiprocessing. I would argue for two reasons.

    1. Ability to run the process for a set period of time without having to "kill" the whole script/process.
    2. You can place it in a for loop, and have it just start over whenever it dies or you choose to kill it.

    I have taken a different approach entirely, but that is partly because I am saving my tweets at regular(or supposedly regular) intervals. @ Eugeune Yan, I think the try except is a simple and elegant way to deal with the problem. Although, and hopefully someone will have a comment on this; you don't really know when or if it failed with that method, but idk if that really matters(and it would be easy to write a few lines to make that happen).

    import tiipWriter #Twitter & Textfile writer I wrote with Tweepy.
    from add import ThatGuy # utility to supply log file names that won't overwrite old ones.
    import multiprocessing
    
    
    if __name__ == '__main__':
            #number of time increments script needs to run        
            n = 60
            dir = "C:\\Temp\\stufffolder\\twiitlog"
            list = []
            print "preloading logs"
            ThatGuy(n,dir,list) #Finds any existing logs in the folder and one-ups it
    
            for a in list:
                print "Collecting Tweets....."
                # this is my twitter/textfile writer process
                p = multiprocessing.Process(target=tiipWriter.tiipWriter,args = (a,)) 
                p.start()
                p.join(1800) # num of seconds the process will run
                if p.is_alive():
                    print " \n Saving Twitter Stream log   @  " + str(a)
                    p.terminate()
                    p.join()
                a = open(a,'r')
                a.close()
                if a.closed == True:
                    print "File successfully closed"
                else: a.close()
                print "jamaica" #cuz why not
    
    0 讨论(0)
  • 2020-12-13 17:24

    I have written a 2 process streaming using tweepy. It downloads, compresses and dumps the data into files which are rotated every hour. The program is restarted every hour and it can check the streaming process periodically to see if any new tweet is downloaded or not. If not it restarts the whole system.

    The code can be found here. Note that for compression it uses pipes. In case compression is not needed modifying the source is easy.

    0 讨论(0)
提交回复
热议问题