dict objects converting to string when read from csv to dataframe pandas python

前端 未结 1 1321
栀梦
栀梦 2021-01-21 08:48

I have a csv file, which has got many columns. One column contains data in the form of dict objects as well as strings.

For eg: Column contains data like : {\"a\":5,\"b\

1条回答
  •  清歌不尽
    2021-01-21 09:15

    You can convert the strings that should be dicts (or other types) using literal_eval:

    from ast import literal_eval
    
    def try_literal_eval(s):
        try:
            return literal_eval(s)
        except ValueError:
            return s
    

    Now you can apply this to your DataFrame:

    In [11]: df = pd.DataFrame({'A': ["hello","world",'{"a":5,"b":6,"c":8}',"usa","india",'{"d":9,"e":10,"f":11}']})
    
    In [12]: df.loc[2, "A"]
    Out[12]: '{"a":5,"b":6,"c":8}'
    
    In [13]: df
    Out[13]:
                           A
    0                  hello
    1                  world
    2    {"a":5,"b":6,"c":8}
    3                    usa
    4                  india
    5  {"d":9,"e":10,"f":11}
    
    
    In [14]: df.applymap(try_literal_eval)
    Out[14]:
                                A
    0                       hello
    1                       world
    2    {'a': 5, 'b': 6, 'c': 8}
    3                         usa
    4                       india
    5  {'d': 9, 'e': 10, 'f': 11}
    
    In [15]: df.applymap(try_literal_eval).loc[2, "A"]
    Out[15]: {'a': 5, 'b': 6, 'c': 8}
    

    Note: This is pretty expensive (time-wise) as far as other calls go, however when you're dealing with dictionaries in DataFrames/Series you're necessarily defaulting back to python objects so things are going to be relatively slow... It's probably a good idea to denormalize i.e. get the data back as columns e.g. using json_normalize.

    0 讨论(0)
提交回复
热议问题