How to create pandas dataframe from Twitter Search API?

后端未结

关注

 1  683

I am working with the Twitter Search API which returns a dictionary of dictionaries. My goal is to create a dataframe from a list of keys in the response dictionary.

相关标签:

1条回答

Happy的楠姐

2021-01-16 10:47

I will share a more generic solution that I came up with, as I was working with the Twitter API. Let's say you have the ID's of tweets that you want to fetch in a list called my_ids :

# Fetch tweets from the twitter API using the following loop:
list_of_tweets = []
# Tweets that can't be found are saved in the list below:
cant_find_tweets_for_those_ids = []
for each_id in my_ids:   
    try:
        list_of_tweets.append(api.get_status(each_id))
    except Exception as e:
        cant_find_tweets_for_those_ids.append(each_id)

Then in this code block we isolate the json part of each tweepy status object that we have downloaded and we add them all into a list....

my_list_of_dicts = []
for each_json_tweet in list_of_tweets:
    my_list_of_dicts.append(each_json_tweet._json)

...and we write this list into a txt file:

with open('tweet_json.txt', 'w') as file:
        file.write(json.dumps(my_list_of_dicts, indent=4))

Now we are going to create a DataFrame from the tweet_json.txt file (I have added some keys that were relevant to my use case that I was working on, but you can add your specific keys instead):

my_demo_list = []
with open('tweet_json.txt', encoding='utf-8') as json_file:  
    all_data = json.load(json_file)
    for each_dictionary in all_data:
        tweet_id = each_dictionary['id']
        whole_tweet = each_dictionary['text']
        only_url = whole_tweet[whole_tweet.find('https'):]
        favorite_count = each_dictionary['favorite_count']
        retweet_count = each_dictionary['retweet_count']
        created_at = each_dictionary['created_at']
        whole_source = each_dictionary['source']
        only_device = whole_source[whole_source.find('rel="nofollow">') + 15:-4]
        source = only_device
        retweeted_status = each_dictionary['retweeted_status'] = each_dictionary.get('retweeted_status', 'Original tweet')
        if retweeted_status == 'Original tweet':
            url = only_url
        else:
            retweeted_status = 'This is a retweet'
            url = 'This is a retweet'

        my_demo_list.append({'tweet_id': str(tweet_id),
                             'favorite_count': int(favorite_count),
                             'retweet_count': int(retweet_count),
                             'url': url,
                             'created_at': created_at,
                             'source': source,
                             'retweeted_status': retweeted_status,
                            })
        tweet_json = pd.DataFrame(my_demo_list, columns = ['tweet_id', 'favorite_count', 
                                                       'retweet_count', 'created_at',
                                                       'source', 'retweeted_status', 'url'])

0 讨论(0)