remove entries with nan values in python dictionary

后端 未结 4 1470
[愿得一人]
[愿得一人] 2021-01-18 12:47

I have the foll. dictionary in python:

OrderedDict([(30, (\'A1\', 55.0)), (31, (\'A2\', 125.0)), (32, (\'A3\', 180.0)), (43, (\'A4\', nan))])
4条回答
  •  孤街浪徒
    2021-01-18 13:09

    Since you have pandas, you can leverage pandas' pd.Series.notnull function here, which works with mixed dtypes.

    >>> import pandas as pd
    >>> {k: v for k, v in dict_cg.items() if pd.Series(v).notna().all()}
    {30: ('A1', 55.0), 31: ('A2', 125.0), 32: ('A3', 180.0)}
    

    This is not part of the answer, but may help you understand how I've arrived at the solution. I came across some weird behaviour when trying to solve this question, using pd.notnull directly.

    Take dict_cg[43].

    >>> dict_cg[43]
    ('A4', nan)
    

    pd.notnull does not work.

    >>> pd.notnull(dict_cg[43])
    True
    

    It treats the tuple as a single value (rather than an iterable of values). Furthermore, converting this to a list and then testing also gives an incorrect answer.

    >>> pd.notnull(list(dict_cg[43]))
    array([ True,  True])
    

    Since the second value is nan, the result I'm looking for should be [True, False]. It finally works when you pre-convert to a Series:

    >>> pd.Series(dict_cg[43]).notnull() 
    0     True
    1    False
    dtype: bool
    

    So, the solution is to Series-ify it and then test the values.

    Along similar lines, another (admittedly roundabout) solution is to pre-convert to an object dtype numpy array, and pd.notnull will work directly:

    >>> pd.notnull(np.array(dict_cg[43], dtype=object))
    Out[151]: array([True,  False])
    

    I imagine that pd.notnull directly converts dict_cg[43] to a string array under the covers, rendering the NaN as a string "nan", so it is no longer a "null" value.

提交回复
热议问题