how to take all tweets in a hashtag with tweepy?

后端 未结 4 510
余生分开走
余生分开走 2021-02-04 22:53

I\'m trying to take every open tweets in a hashtag but my code does not go further than 299 tweets.

I also trying to take tweets from a specific time line like tweets on

相关标签:
4条回答
  • 2021-02-04 23:05

    Have a look at this: https://tweepy.readthedocs.io/en/v3.5.0/cursor_tutorial.html

    And try this:

    import tweepy
    
    auth = tweepy.OAuthHandler(CONSUMER_TOKEN, CONSUMER_SECRET)
    api = tweepy.API(auth)
    
    for tweet in tweepy.Cursor(api.search, q='#python', rpp=100).items():
        # Do something
        pass
    

    In your case you have a max number of tweets to get, so as per the linked tutorial you could do:

    import tweepy
    
    MAX_TWEETS = 5000000000000000000000
    
    auth = tweepy.OAuthHandler(CONSUMER_TOKEN, CONSUMER_SECRET)
    api = tweepy.API(auth)
    
    for tweet in tweepy.Cursor(api.search, q='#python', rpp=100).items(MAX_TWEETS):
        # Do something
        pass
    

    If you want tweets after a given ID, you can also pass that argument.

    0 讨论(0)
  • 2021-02-04 23:08

    Check twitter api documentation, probably it allows just 300 tweets to parse. I would recommend to forget api, make it with requests with streaming. The api is an implementation of requests with limitations.

    0 讨论(0)
  • 2021-02-04 23:14

    This code worked for me.

    import tweepy
    import pandas as pd
    import os
    
    #Twitter Access
    auth = tweepy.OAuthHandler( 'xxx','xxx')
    auth.set_access_token('xxx-xxx','xxx')
    api = tweepy.API(auth,wait_on_rate_limit = True)
    
    df = pd.DataFrame(columns=['text', 'source', 'url'])
    msgs = []
    msg =[]
    
    for tweet in tweepy.Cursor(api.search, q='#bmw', rpp=100).items(10):
        msg = [tweet.text, tweet.source, tweet.source_url] 
        msg = tuple(msg)                    
        msgs.append(msg)
    
    df = pd.DataFrame(msgs)
    
    0 讨论(0)
  • 2021-02-04 23:15

    Sorry, I can't answer in comment, too long. :)

    Sure :) Check this example: Advanced searched for #data keyword 2015 may - 2016 july Got this url: https://twitter.com/search?l=&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd

    session = requests.session()
    keyword = 'data'
    date1 = '2015-05-01'
    date2 = 2016-07-31
    session.get('https://twitter.com/search?l=&q=%23+keyword+%20since%3A+date1+%20until%3A+date2&src=typd', streaming = True)
    

    Now we have all the requested tweets, Probably you could have problems with 'pagination' Pagination url ->

    https://twitter.com/i/search/timeline?vertical=news&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd&include_available_features=1&include_entities=1&max_position=TWEET-759522481271078912-759538448860581892-BD1UO2FFu9QAAAAAAAAETAAAAAcAAAASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&reset_error_state=false

    Probably you could put a random tweet id, or you can parse first, or requests some data from twitter. It can be done.

    Use Chrome's networking tab to find all the requested information :)

    0 讨论(0)
提交回复
热议问题