How to fill dataframe Nan values with empty list [] in pandas?

后端 未结 11 719
粉色の甜心
粉色の甜心 2020-12-24 11:02

This is my dataframe:

          date                          ids
0     2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
1     2011-04-24  [0,         


        
相关标签:
11条回答
  • 2020-12-24 11:28

    You can first use loc to locate all rows that have a nan in the ids column, and then loop through these rows using at to set their values to an empty list:

    for row in df.loc[df.ids.isnull(), 'ids'].index:
        df.at[row, 'ids'] = []
    
    >>> df
            date                                             ids
    0 2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    1 2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    2 2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    3 2011-04-26                                              []
    4 2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    5 2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    
    0 讨论(0)
  • 2020-12-24 11:31

    My approach is similar to @hellpanderrr's, but instead tests for list-ness rather than using isnan:

    df['ids'] = df['ids'].apply(lambda d: d if isinstance(d, list) else [])
    

    I originally tried using pd.isnull (or pd.notnull) but, when given a list, that returns the null-ness of each element.

    0 讨论(0)
  • 2020-12-24 11:42

    Maybe not the most short/optimized solution, but I think is pretty readable:

    # Packages
    import ast
    
    # Masking-in nans
    mask = df['ids'].isna()
    
    # Filling nans with a list-like string and literally-evaluating such string
    df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(ast.literal_eval)
    

    The drawback is that you need to load the ast package.

    EDIT

    I recently figured out the existence of the eval() built-in. This avoids importing any extra package.

    # Masking-in nans
    mask = df['ids'].isna()
    
    # Filling nans with a list-like string and literally-evaluating such string
    df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(eval)
    
    0 讨论(0)
  • 2020-12-24 11:43

    A simple solution would be:

    df['ids'].fillna("").apply(list)
    
    0 讨论(0)
  • 2020-12-24 11:44

    Another solution using numpy:

    df.ids = np.where(df.ids.isnull(), pd.Series([[]]*len(df)), df.ids)
    

    Or using combine_first:

    df.ids = df.ids.combine_first(pd.Series([[]]*len(df)))
    
    0 讨论(0)
  • 2020-12-24 11:45

    Surprisingly, passing a dict with empty lists as values seems to work for Series.fillna, but not DataFrame.fillna - so if you want to work on a single column you can use this:

    >>> df
         A    B    C
    0  0.0  2.0  NaN
    1  NaN  NaN  5.0
    2  NaN  7.0  NaN
    >>> df['C'].fillna({i: [] for i in df.index})
    0    []
    1     5
    2    []
    Name: C, dtype: object
    

    The solution can be extended to DataFrames by applying it to every column.

    >>> df.apply(lambda s: s.fillna({i: [] for i in df.index}))
        A   B   C
    0   0   2  []
    1  []  []   5
    2  []   7  []
    

    Note: for large Series/DataFrames with few missing values, this might create an unreasonable amount of throwaway empty lists.

    Tested with pandas 1.0.5.

    0 讨论(0)
提交回复
热议问题