Stopping Tweepy stream after a duration parameter (# lines, seconds, #Tweets, etc)

问题

I am using Tweepy to capture streaming tweets based off of the hashtag #WorldCup, as seen by the code below. It works as expected.

class StdOutListener(StreamListener):
  ''' Handles data received from the stream. '''

  def on_status(self, status):
      # Prints the text of the tweet
      print('Tweet text: ' + status.text)

      # There are many options in the status object,
      # hashtags can be very easily accessed.
      for hashtag in status.entries['hashtags']:
          print(hashtag['text'])

      return true

    def on_error(self, status_code):
        print('Got an error with status code: ' + str(status_code))
        return True # To continue listening

    def on_timeout(self):
        print('Timeout...')
        return True # To continue listening

if __name__ == '__main__':
   listener = StdOutListener()
   auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
   auth.set_access_token(access_token, access_token_secret)

   stream = Stream(auth, listener)
   stream.filter(follow=[38744894], track=['#WorldCup'])

Because this is a hot hashtag right now, searches don't take too long to catch the maximum amount of tweets that Tweepy lets you get in one transaction. However, if I was going to search on #StackOverflow, it might be much slower, and therefore, I'd like a way to kill the stream. I could do this on several parameters, such as stopping after 100 tweets, stopping after 3 minutes, after a text output file has reached 150 lines, etc. I do know that the socket timeout time isn't used to achieve this.

I have taken a look at this similar question:

Tweepy Streaming - Stop collecting tweets at x amount

However, it appears to not use the streaming API. The data that it collects is also very messy, whereas this text output is clean.

Can anyone suggest a way to stop Tweepy (when using the stream in this method), based on some user input parameter, besides a keyboard interrupt?

Thanks

回答1:

I solved this, so I'm going to be one of those internet heroes that answers their own question.

This is achieved by using static Python variables for the counter and for the stop value (e.g. stop after you grab 20 tweets). This is currently a geolocation search, but you could easily swap it for a hashtag search by using the getTweetsByHashtag() method.

#!/usr/bin/env python
from tweepy import (Stream, OAuthHandler)
from tweepy.streaming import StreamListener

class Listener(StreamListener):

    tweet_counter = 0 # Static variable

    def login(self):
        CONSUMER_KEY =
        CONSUMER_SECRET =
        ACCESS_TOKEN =
        ACCESS_TOKEN_SECRET =

        auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
        auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
        return auth

    def on_status(self, status):
        Listener.tweet_counter += 1
        print(str(Listener.tweet_counter) + '. Screen name = "%s" Tweet = "%s"'
              %(status.author.screen_name, status.text.replace('\n', ' ')))

        if Listener.tweet_counter < Listener.stop_at:
            return True
        else:
            print('Max num reached = ' + str(Listener.tweet_counter))
            return False

    def getTweetsByGPS(self, stop_at_number, latitude_start, longitude_start, latitude_finish, longitude_finish):
        try:
            Listener.stop_at = stop_at_number # Create static variable
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60) # Socket timeout value
            streaming_api.filter(follow=None, locations=[latitude_start, longitude_start, latitude_finish, longitude_finish])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')

    def getTweetsByHashtag(self, stop_at_number, hashtag):
        try:
            Listener.stopAt = stop_at_number
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60)
            # Atlanta area.
            streaming_api.filter(track=[hashtag])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')

listener = Listener()
listener.getTweetsByGPS(20, -84.395198, 33.746876, -84.385585, 33.841601) # Atlanta area.

来源：https://stackoverflow.com/questions/24663403/stopping-tweepy-stream-after-a-duration-parameter-lines-seconds-tweets-et

标签

python

twitter

streaming

tweepy

duration