How to add a location filter to tweepy module

后端 未结 4 1424
难免孤独
难免孤独 2020-11-27 18:07

I have found the following piece of code that works pretty well for letting me view in Python Shell the standard 1% of the twitter firehose:

import sys
impor         


        
相关标签:
4条回答
  • 2020-11-27 18:26

    The streaming API doesn't allow to filter by location AND keyword simultaneously.

    Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twitter (even non-geo tweets) OR coming from the San Francisco area.

    Source: https://dev.twitter.com/docs/streaming-apis/parameters#locations

    What you can do is ask the streaming API for keyword or located tweets and then filter the resulting stream in your app by looking into each tweet.

    If you modify the code as follows you will capture tweets in United Kingdom, then those tweets get filtered to only show those that contain "manchester united"

    import sys
    import tweepy
    
    consumer_key=""
    consumer_secret=""
    access_key=""
    access_secret=""
    
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    
    
    class CustomStreamListener(tweepy.StreamListener):
        def on_status(self, status):
            if 'manchester united' in status.text.lower():
                print status.text
    
        def on_error(self, status_code):
            print >> sys.stderr, 'Encountered error with status code:', status_code
            return True # Don't kill the stream
    
        def on_timeout(self):
            print >> sys.stderr, 'Timeout...'
            return True # Don't kill the stream
    
    sapi = tweepy.streaming.Stream(auth, CustomStreamListener())    
    sapi.filter(locations=[-6.38,49.87,1.77,55.81])
    
    0 讨论(0)
  • 2020-11-27 18:27

    Juan gave the correct answer. I'm filtering for Germany only using this:

    # Bounding boxes for geolocations
    # Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
    GEOBOX_WORLD = [-180,-90,180,90]
    GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]
    
    stream.filter(locations=GEOBOX_GERMANY)
    

    This is a pretty crude box that includes parts of some other countries. If you want a finer grain you can combine multiple boxes to fill out the location you need.

    It should be noted though that you limit the number of tweets quite a bit if you filter by geotags. This is from roughly 5 million Tweets from my test database (the query should return the %age of tweets that actually contain a geolocation):

    > db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count()
    0.016668392651547598
    

    So only 1.67% of my sample of the 1% stream include a geotag. However there's other ways of figuring out a user's location: http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf

    0 讨论(0)
  • 2020-11-27 18:31

    sapi.filter(track=['manchester united'],locations=['GPS Coordinates'])

    0 讨论(0)
  • 2020-11-27 18:43

    You can't filter it while streaming but you could filter it at the output stage, if you were writing the tweets to a file.

    0 讨论(0)
提交回复
热议问题