drop non-json object rows from python dataframe column

后端 未结 3 1112
孤城傲影
孤城傲影 2021-01-22 23:43

I have a dataframe such that the column contains both json objects and strings. I want to get rid of rows that does not contains json objects.

Below is how my dataframe

3条回答
  •  情话喂你
    2021-01-23 00:16

    I think I would prefer to use an isinstance check:

    In [11]: df.loc[df.A.apply(lambda d: isinstance(d, dict))]
    Out[11]:
                                A
    2    {'a': 5, 'b': 6, 'c': 8}
    5  {'d': 9, 'e': 10, 'f': 11}
    

    If you want to include numbers too, you can do:

    In [12]: df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]
    Out[12]:
                                A
    2    {'a': 5, 'b': 6, 'c': 8}
    5  {'d': 9, 'e': 10, 'f': 11}
    

    Adjust this to whichever types you want to include...


    The last step, json_normalize takes a list of json objects, for whatever reason a Series is no good (and gives the KeyError), you can make this a list and your good to go:

    In [21]: df1 = df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]
    
    In [22]: json_normalize(list(df1["A"]))
    Out[22]:
         a    b    c    d     e     f
    0  5.0  6.0  8.0  NaN   NaN   NaN
    1  NaN  NaN  NaN  9.0  10.0  11.0
    

提交回复
热议问题