This is my dataframe:
date ids
0 2011-04-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
1 2011-04-24 [0,
You can first use loc
to locate all rows that have a nan
in the ids
column, and then loop through these rows using at
to set their values to an empty list:
for row in df.loc[df.ids.isnull(), 'ids'].index:
df.at[row, 'ids'] = []
>>> df
date ids
0 2011-04-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
1 2011-04-24 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
2 2011-04-25 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3 2011-04-26 []
4 2011-04-27 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5 2011-04-28 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
My approach is similar to @hellpanderrr's, but instead tests for list-ness rather than using isnan
:
df['ids'] = df['ids'].apply(lambda d: d if isinstance(d, list) else [])
I originally tried using pd.isnull
(or pd.notnull
) but, when given a list, that returns the null-ness of each element.
Maybe not the most short/optimized solution, but I think is pretty readable:
# Packages
import ast
# Masking-in nans
mask = df['ids'].isna()
# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(ast.literal_eval)
The drawback is that you need to load the ast
package.
EDIT
I recently figured out the existence of the eval()
built-in. This avoids importing any extra package.
# Masking-in nans
mask = df['ids'].isna()
# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(eval)
A simple solution would be:
df['ids'].fillna("").apply(list)
Another solution using numpy:
df.ids = np.where(df.ids.isnull(), pd.Series([[]]*len(df)), df.ids)
Or using combine_first:
df.ids = df.ids.combine_first(pd.Series([[]]*len(df)))
Surprisingly, passing a dict with empty lists as values seems to work for Series.fillna
, but not DataFrame.fillna
- so if you want to work on a single column you can use this:
>>> df
A B C
0 0.0 2.0 NaN
1 NaN NaN 5.0
2 NaN 7.0 NaN
>>> df['C'].fillna({i: [] for i in df.index})
0 []
1 5
2 []
Name: C, dtype: object
The solution can be extended to DataFrames by applying it to every column.
>>> df.apply(lambda s: s.fillna({i: [] for i in df.index}))
A B C
0 0 2 []
1 [] [] 5
2 [] 7 []
Note: for large Series/DataFrames with few missing values, this might create an unreasonable amount of throwaway empty lists.
Tested with pandas
1.0.5.