How to json_normalize a column with NaNs

前端 未结 1 434
情书的邮戳
情书的邮戳 2020-12-07 05:17
  • This question is specific to columns of data in a pandas.DataFrame
  • This question depends on if the values in the columns are str,
相关标签:
1条回答
  • 2020-12-07 05:41
    • As pointed out in a comment, there is always the option to:
      • df = df.dropna().reset_index(drop=True)
      • That's fine for the dummy data here, or when dealing with a dataframe where the other columns don't matter.
      • Not a great option for dataframes with additional columns that are required.

    Case 1

    • Since the column contains str types, fillna with '{}' (a str)
    import numpy as np
    import pandas as pd
    from ast import literal_eval
    
    df = pd.DataFrame({'col_str': ['{"a": "46", "b": "3", "c": "12"}', '{"b": "2", "c": "7"}', '{"c": "11"}', np.NaN]})
    
                                col_str
    0  {"a": "46", "b": "3", "c": "12"}
    1              {"b": "2", "c": "7"}
    2                       {"c": "11"}
    3                               NaN
    
    type(df.iloc[0, 0])
    [out]: str
    
    # fillna
    df.col_str = df.col_str.fillna('{}')
    
    # convert the column to dicts
    df.col_str = df.col_str.apply(literal_eval)
    
    # use json_normalize
    df = df.join(pd.json_normalize(df.col_str)).drop(columns=['col_str'])
    
    # display(df)
         a    b    c
    0   46    3   12
    1  NaN    2    7
    2  NaN  NaN   11
    3  NaN  NaN  NaN
    

    Case 2

    • Since the column contains dict types, fillna with {} (not a str)
    • This needs to be filled using a dict-comprehension, since fillna({}) does not work
    df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]})
    
                               col_dict
    0  {'a': '46', 'b': '3', 'c': '12'}
    1              {'b': '2', 'c': '7'}
    2                       {'c': '11'}
    3                               NaN
    
    type(df.iloc[0, 0])
    [out]: dict
        
    # fillna
    df.col_dict = df.col_dict.fillna({i: {} for i in df.index})
    
    # use json_normalize
    df = df.join(pd.json_normalize(df.col_dict)).drop(columns=['col_dict'])
    
    # display(df)
         a    b    c
    0   46    3   12
    1  NaN    2    7
    2  NaN  NaN   11
    3  NaN  NaN  NaN
    

    Case 3

    1. Fill the NaNs with '[]' (a str)
    2. Now literal_eval will work
    3. .explode can be used on the column to separate the dict values to rows
    4. Now the NaNs need to be filled with {} (not a str)
    5. Then the column can be normalized
    • For the case when the column is lists of dicts, that aren't str type, skip to .explode.
    df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]})
    
                                                        col_str
    0  [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]
    1                       [{"b": "2", "c": "7"}, {"c": "11"}]
    2                                                       NaN
    
    type(df.iloc[0, 0])
    [out]: str
        
    # fillna
    df.col_str = df.col_str.fillna('[]')
    
    # literal_eval
    df.col_str = df.col_str.apply(literal_eval)
    
    # explode
    df = df.explode('col_str').reset_index(drop=True)
    
    # fillna again
    df.col_str = df.col_str.fillna({i: {} for i in df.index})
    
    # use json_normalize
    df = df.join(pd.json_normalize(df.col_str)).drop(columns=['col_str'])
    
    # display(df)
         a    b    c
    0   46    3   12
    1  NaN    2    7
    2  NaN    2    7
    3  NaN  NaN   11
    4  NaN  NaN  NaN
    
    0 讨论(0)
提交回复
热议问题