I\'m collecting Twitter data (tweets + meta data) into a MongoDB server. Now I want to do some statistical analysis. To get the data from MongoDB into a Pandas data frame I used
I use a function like this to get nested JSON lines into a dataframe. It uses the handy pandas json.normalize
function:
import pandas as pd
from bson import json_util, ObjectId
from pandas.io.json import json_normalize
import json
def mongo_to_dataframe(mongo_data):
sanitized = json.loads(json_util.dumps(mongo_data))
normalized = json_normalize(sanitized)
df = pd.DataFrame(normalized)
return df
Just pass your mongo data by calling the function with it as an argument.
sanitized = json.loads(json_util.dumps(mongo_data))
loads the JSON lines as regular JSON
normalized = json_normalize(sanitized)
un-nests the data
df = pd.DataFrame(normalized)
simply turns it into a dataframe