remove entries with nan values in python dictionary

后端 未结 4 1468
[愿得一人]
[愿得一人] 2021-01-18 12:47

I have the foll. dictionary in python:

OrderedDict([(30, (\'A1\', 55.0)), (31, (\'A2\', 125.0)), (32, (\'A3\', 180.0)), (43, (\'A4\', nan))])
相关标签:
4条回答
  • 2021-01-18 13:09

    Since you have pandas, you can leverage pandas' pd.Series.notnull function here, which works with mixed dtypes.

    >>> import pandas as pd
    >>> {k: v for k, v in dict_cg.items() if pd.Series(v).notna().all()}
    {30: ('A1', 55.0), 31: ('A2', 125.0), 32: ('A3', 180.0)}
    

    This is not part of the answer, but may help you understand how I've arrived at the solution. I came across some weird behaviour when trying to solve this question, using pd.notnull directly.

    Take dict_cg[43].

    >>> dict_cg[43]
    ('A4', nan)
    

    pd.notnull does not work.

    >>> pd.notnull(dict_cg[43])
    True
    

    It treats the tuple as a single value (rather than an iterable of values). Furthermore, converting this to a list and then testing also gives an incorrect answer.

    >>> pd.notnull(list(dict_cg[43]))
    array([ True,  True])
    

    Since the second value is nan, the result I'm looking for should be [True, False]. It finally works when you pre-convert to a Series:

    >>> pd.Series(dict_cg[43]).notnull() 
    0     True
    1    False
    dtype: bool
    

    So, the solution is to Series-ify it and then test the values.

    Along similar lines, another (admittedly roundabout) solution is to pre-convert to an object dtype numpy array, and pd.notnull will work directly:

    >>> pd.notnull(np.array(dict_cg[43], dtype=object))
    Out[151]: array([True,  False])
    

    I imagine that pd.notnull directly converts dict_cg[43] to a string array under the covers, rendering the NaN as a string "nan", so it is no longer a "null" value.

    0 讨论(0)
  • 2021-01-18 13:15

    Your original code didn't actually have pandas and importing it just to filter for NaN seems excessive. However, your code was using numpy (np).

    Assuming your first line should read:

    dict_cg = OrderedDict([(30, ('A1', 55.0)), (31, ('A2', 125.0)), (32, ('A3', 180.0)), (43, ('A4', np.nan))])
    

    This line is close to what you had and works, although it requires you import the default library numbers:

    OrderedDict([(k, vs) for k, vs in d.items() if not any ([isinstance(v, numbers.Number) and np.isnan(v) for v in vs])])
    

    This way, you don't need pandas, your result is still an OrderedDict (as you had before) and you don't run into problems with the strings in the tuples, since conditions around and are evaluated left to right.

    0 讨论(0)
  • 2021-01-18 13:22

    This should work:

    for k,v in dict_cg.items():
        if np.isnan(v[1]):
           dict_cg.pop(k)
    print dict_cg
    

    Output:

    OrderedDict([(30, ('A1', 55.0)), (31, ('A2', 125.0)), (32, ('A3', 180.0))])
    
    0 讨论(0)
  • 2021-01-18 13:29

    user308827,

    The code in your question seems to confuse keys and values and ignore the fact that your values are tuples. Here's a one liner using std libs and a dict comprehension that works in python 2,3:

    from collections import OrderedDict
    import math
    
    od = OrderedDict([(30, ('A1', 55.0)), (31, ('A2', 125.0)), (32, ('A3', 180.0)), (43, ('A4', float('Nan')))])
    
    no_nans = OrderedDict({k:v for k, v in od.items() if not math.isnan(v[1])})
    # OrderedDict([(30, ('A1', 55.0)), (31, ('A2', 125.0)), (32, ('A3', 180.0))])
    
    0 讨论(0)
提交回复
热议问题