how to take all tweets in a hashtag with tweepy?

后端未结

关注

 4  510

余生分开走

I\'m trying to take every open tweets in a hashtag but my code does not go further than 299 tweets.

I also trying to take tweets from a specific time line like tweets on

相关标签:

4条回答

天命终不由人

2021-02-04 23:05

Have a look at this: https://tweepy.readthedocs.io/en/v3.5.0/cursor_tutorial.html

And try this:

import tweepy

auth = tweepy.OAuthHandler(CONSUMER_TOKEN, CONSUMER_SECRET)
api = tweepy.API(auth)

for tweet in tweepy.Cursor(api.search, q='#python', rpp=100).items():
    # Do something
    pass

In your case you have a max number of tweets to get, so as per the linked tutorial you could do:

import tweepy

MAX_TWEETS = 5000000000000000000000

auth = tweepy.OAuthHandler(CONSUMER_TOKEN, CONSUMER_SECRET)
api = tweepy.API(auth)

for tweet in tweepy.Cursor(api.search, q='#python', rpp=100).items(MAX_TWEETS):
    # Do something
    pass

If you want tweets after a given ID, you can also pass that argument.

0 讨论(0)

南笙

2021-02-04 23:08

Check twitter api documentation, probably it allows just 300 tweets to parse. I would recommend to forget api, make it with requests with streaming. The api is an implementation of requests with limitations.

0 讨论(0)
发布评论:

提交评论
- 加载中...

萌比男神i

2021-02-04 23:14

This code worked for me.

import tweepy
import pandas as pd
import os

#Twitter Access
auth = tweepy.OAuthHandler( 'xxx','xxx')
auth.set_access_token('xxx-xxx','xxx')
api = tweepy.API(auth,wait_on_rate_limit = True)

df = pd.DataFrame(columns=['text', 'source', 'url'])
msgs = []
msg =[]

for tweet in tweepy.Cursor(api.search, q='#bmw', rpp=100).items(10):
    msg = [tweet.text, tweet.source, tweet.source_url] 
    msg = tuple(msg)                    
    msgs.append(msg)

df = pd.DataFrame(msgs)

0 讨论(0)

梦毁少年i

2021-02-04 23:15
Sorry, I can't answer in comment, too long. :)

Sure :) Check this example: Advanced searched for #data keyword 2015 may - 2016 july Got this url: https://twitter.com/search?l=&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd
```
session = requests.session()
keyword = 'data'
date1 = '2015-05-01'
date2 = 2016-07-31
session.get('https://twitter.com/search?l=&q=%23+keyword+%20since%3A+date1+%20until%3A+date2&src=typd', streaming = True)
```
Now we have all the requested tweets, Probably you could have problems with 'pagination' Pagination url ->

https://twitter.com/i/search/timeline?vertical=news&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd&include_available_features=1&include_entities=1&max_position=TWEET-759522481271078912-759538448860581892-BD1UO2FFu9QAAAAAAAAETAAAAAcAAAASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&reset_error_state=false

Probably you could put a random tweet id, or you can parse first, or requests some data from twitter. It can be done.

Use Chrome's networking tab to find all the requested information :)
0 讨论(0)
发布评论:

提交评论
- 加载中...