Getting nested data from MongoDB into a Pandas data frame

前端未结

关注

 1  1763

I\'m collecting Twitter data (tweets + meta data) into a MongoDB server. Now I want to do some statistical analysis. To get the data from MongoDB into a Pandas data frame I used

相关标签:

1条回答

感情败类

2021-01-21 16:36
I use a function like this to get nested JSON lines into a dataframe. It uses the handy pandas json.normalize function:
```
import pandas as pd
from bson import json_util, ObjectId
from pandas.io.json import json_normalize
import json

def mongo_to_dataframe(mongo_data):

        sanitized = json.loads(json_util.dumps(mongo_data))
        normalized = json_normalize(sanitized)
        df = pd.DataFrame(normalized)

        return df
```
Just pass your mongo data by calling the function with it as an argument.

sanitized = json.loads(json_util.dumps(mongo_data)) loads the JSON lines as regular JSON

normalized = json_normalize(sanitized) un-nests the data

df = pd.DataFrame(normalized) simply turns it into a dataframe
0 讨论(0)
发布评论:

提交评论
- 加载中...