问题
So i am working on a project using twitterAPI for tweet collection of different keywords using specific longitude and latitudes. I crawled the data for my tweets the data is a list of dictionares for each key word containing fields such as:
dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'extended_entities', 'metadata', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])
I now wanted to extract the text for each of the keywords x1_tweets, x2_tweets and x3_tweets into a json file
for that i defined a function:
def save_to_json(obj, filename):
with open(filename, 'w') as fp:
json.dump(obj, fp, indent=4, sort_keys=True)
where the obj is a list of dictionaries and filename is the filename i want to save the document with. When i try to use the function for my use however example save_to_json(x1_tweets, doors)
it returns me a file containing everything in it. How should i use the function that it returns me a file containing only tweets?
Any help would be appriciated! thanks in advance!
here is how the json file looks like:
[
{
"contributors": null,
"coordinates": null,
"created_at": "Mon May 18 02:08:53 +0000 2020",
"entities": {
"hashtags": [],
"media": [
{
"display_url": "pic.twitter.com/ig7H0jIHOq",
"expanded_url": "https://twitter.com/CMag051/status/1262303473682022400/photo/1",
"id": 1262203448080007168,
"id_str": "1262203448080007168",
"indices": [
98,
121
],
"media_url": "http://pbs.twimg.com/media/EYQ_WT0VAAA6hTK.jpg",
"media_url_https": "https://pbs.twimg.com/media/EYQ_WT0VAAA6hTK.jpg",
"sizes": {
"large": {
"h": 2048,
"resize": "fit",
"w": 1536
},
"medium": {
"h": 1200,
"resize": "fit",
"w": 900
},
"small": {
"h": 680,
"resize": "fit",
"w": 510
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"type": "photo",
"url": "https://twitter.com/ig7H0jIHOq"
}
],
"symbols": [],
"urls": [],
"user_mentions": []
},
"extended_entities": {
"media": [
{
"display_url": "pic.twitter.com/ig7H0jvHOq",
"expanded_url": "https://twitter.com/CMag051/status/1262253473682022400/photo/1",
"id": 1262203448080007168,
"id_str": "1262203448080007168",
"indices": [
98,
121
],
"media_url": "http://pbs.twimg.com/media/EYQ_WT0VAAA6hTK.jpg",
"media_url_https": "https://pbs.twimg.com/media/EYQ_WT0VAAA6hTK.jpg",
"sizes": {
"large": {
"h": 2048,
"resize": "fit",
"w": 1536
},
"medium": {
"h": 1200,
"resize": "fit",
"w": 900
},
"small": {
"h": 680,
"resize": "fit",
"w": 510
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"type": "photo",
"url": "https://twitter.com/ig7H0iIHOq"
}
]
},
"favorite_count": 1,
"favorited": false,
"geo": null,
"id": 1262203473682022400,
"id_str": "1262203473682022400",
"in_reply_to_screen_name": null,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"is_quote_status": false,
"lang": "en",
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
},
"place": null,
"possibly_sensitive": false,
"retweet_count": 0,
"retweeted": false,
"source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
"text": "Beautiful evening. \n\nSitting on patio, eating some apple , and listening to the birds chirp. https://twitter.com/ig7H0jIHOq",
"truncated": false,
"user": {
"contributors_enabled": false,
"created_at": "Wed Apr 01 03:32:05 +0000 2009",
"default_profile": false,
"default_profile_image": false,
"description": "Photographer | Music & Sports Enthusiast.",
"entities": {
"description": {
"urls": []
}
},
"favourites_count": 19189,
"follow_request_sent": false,
"followers_count": 547,
"following": false,
"friends_count": 2432,
"geo_enabled": false,
"has_extended_profile": true,
"id": 28041855,
"id_str": "28041855",
"is_translation_enabled": false,
"is_translator": false,
"lang": null,
"listed_count": 0,
"location": "Phoenix, AZ",
"name": "Chris",
"notifications": false,
"profile_background_color": "000000",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_banner_url": "https://pbs.twimg.com/profile_banners/28041855/1586840506",
"profile_image_url": "http://pbs.twimg.com/profile_images/1262196071817605121/WBvC3h5P_normal.jpg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1262196071817605121/WBvC3h5P_normal.jpg",
"profile_link_color": "ABB8C2",
"profile_sidebar_border_color": "000000",
"profile_sidebar_fill_color": "000000",
"profile_text_color": "000000",
"profile_use_background_image": false,
"protected": false,
"screen_name": "CMag051",
"statuses_count": 11285,
"time_zone": null,
"translator_type": "none",
"url": null,
"utc_offset": null,
"verified": false
}
}
回答1:
First thing you need to do is change the below code as:
def save_to_json(obj, filename):
with open(filename, 'a') as fp:
json.dump(obj, fp, indent=4, sort_keys=True)
You need to change the mode in which file is open because of the below reason.
w:
Opens in write-only mode. The pointer is placed at the beginning of the file and this will overwrite any existing file with the same name. It will create a new file if one with the same name doesn't exist.
a:
Opens a file for appending new information to it. The pointer is placed at the end of the file. A new file is created if one with the same name doesn't exist.
Also, there is no meaning of sort_keys
as you are only passing a string
and not a dict
. Similarly, there is no meaning of indent=4
for strings
.
If you need some indexing with the tweet text you can use the below code:
tweets = {}
for i, tweet in enumerate(x1_tweets):
tweets[i] = tweet['text']
save_to_json(tweets,'bat.json')
The above code will create a dict
with index to the tweet and write to the file once all tweets are processed.
And if you just need the text of the tweets without the index you can use string aggregation
or use list
and append
all the text from tweet in that list
and write that to the output file.
来源:https://stackoverflow.com/questions/61884620/how-to-get-only-the-text-of-the-tweets-into-a-json-file