Getting nested data from MongoDB into a Pandas data frame

前端 未结 1 1763
南旧
南旧 2021-01-21 15:30

I\'m collecting Twitter data (tweets + meta data) into a MongoDB server. Now I want to do some statistical analysis. To get the data from MongoDB into a Pandas data frame I used

相关标签:
1条回答
  • 2021-01-21 16:36

    I use a function like this to get nested JSON lines into a dataframe. It uses the handy pandas json.normalize function:

    import pandas as pd
    from bson import json_util, ObjectId
    from pandas.io.json import json_normalize
    import json
    
    def mongo_to_dataframe(mongo_data):
    
            sanitized = json.loads(json_util.dumps(mongo_data))
            normalized = json_normalize(sanitized)
            df = pd.DataFrame(normalized)
    
            return df
    

    Just pass your mongo data by calling the function with it as an argument.

    sanitized = json.loads(json_util.dumps(mongo_data)) loads the JSON lines as regular JSON

    normalized = json_normalize(sanitized) un-nests the data

    df = pd.DataFrame(normalized) simply turns it into a dataframe

    0 讨论(0)
提交回复
热议问题