问题
import tweepy
import csv
import json
import nltk
import re
def scrub_text(string):
nltk.download('words')
words = set(nltk.corpus.words.words())
string=re.sub(r'[^a-zA-Z]+', ' ', string).lower()
string=" ".join(w for w in nltk.wordpunct_tokenize(string)
if w.lower() in words or not w.isalpha())
return string
def get_all_tweets():
with open('twitter_credentials.json') as cred_data:
info=json.load(cred_data)
consumer_key=info['API_KEY']
consumer_secret=info['API_SECRET']
access_key=info['ACCESS_TOKEN']
access_secret=info['ACCESS_SECRET']
screen_name = input("Enter twitter Handle: ")
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
api=tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True,
timeout=500000, retry_count=10, retry_delay=100)
all_the_tweets=[]
new_tweets=api.user_timeline(screen_name=screen_name, count=200)
all_the_tweets.extend(new_tweets)
oldest_tweet=all_the_tweets[-1].id - 1
while len(new_tweets) > 0:
new_tweets=api.user_timeline(screen_name=screen_name, count=200,
max_id=oldest_tweet)
all_the_tweets.extend(new_tweets)
oldest_tweet=all_the_tweets[-1].id -1
print('...%s tweets downloaded' %len(all_the_tweets))
outtweets=[[tweet.text.encode('utf-8')] for tweet in all_the_tweets]
outtweets=scrub_text(str(outtweets))
with open('tweets.txt', 'w') as f:
f.write(outtweets)
f.close()
The above python code should download all the tweets from a particular user. It seems to work for most handles, but when I use it for @realDonaldTrump I sometimes get 800, sometimes I get 1. I never get even close to all of the tweets. I am assuming that there is a problem due to how active the account is, but I think there should be a way to get around this.
回答1:
The Twitter timelines API only supports a maximum of 3200 Tweets (source), and this may also depend on age of the Tweet / how far back in time you are paging. Unfortunately, you will not be able to use the API to get all of these Tweets. You would need to use the commercial Full Archive search API to retrieve all of the Tweets from the account.
Regarding the inconsistent number of results, that sounds like a glitch, as it shouldn't vary by that much.
来源:https://stackoverflow.com/questions/60308733/tweepy-returns-inconsistent-and-not-complete-results-for-realdonaldtrump