问题
I'm trying to build an app to track some terms from specifics users using the streaming twitter API.
I made a working python script using tweepy for the streaming api based on this tutorial. But, it's only working if I track tweets by terms or by user ids, but now by both. When I try to search using both of them, the api returns me tweets from any user. My code is here:
#Acessando a API do twitter com as chaves
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)
#Chamando o Listener com o tweepy
api = tweepy.API(auth)
#Chama o stream e passa o que buscar no twitter.
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
list_users = ['11111','22222'] #Some ids
list_terms = ['term1','term2'] #Some terms
sapi.filter(follow=list_users, track=list_terms)
These two variables(list_users
, list_terms
) are lists of user ids and list of terms respectively.
How can I filter tweets stream by users AND by terms? Is there any way to do it with the tweepy filter? Or should I do a verification after retrieving the tweet?
回答1:
Twitter streaming API evaluates different conditions with OR
logic, that is returns union of tweets with terms and from users. So you have to implement custom on_data
function in order to filter with AND
.
Note that you're limited to condition on up to 5000 users and 400 terms, and as rate limit may be an issue, so you'd supply api with a condition that yields lower tweet stream, and filter incoming data with all the rest conditions in post processing.
You can track up to 5,000 users and 400 keywords -- the rate limiting indeed takes effect at 1% of the Firehose, so if at any moment the tweet volume from the union of your keywords and users rises above 1% of all tweets happening in "real time" on the Firehose, you'll get up to 1% of the tweets along with a rate limit notice informing you of how many tweets you missed.
来源:https://stackoverflow.com/questions/14083968/tweepy-tracking-terms-and-following-users