I have a csv file, which has got many columns. One column contains data in the form of dict objects as well as strings.
For eg: Column contains data like : {\"a\":5,\"b\
You can convert the strings that should be dicts (or other types) using literal_eval:
from ast import literal_eval
def try_literal_eval(s):
try:
return literal_eval(s)
except ValueError:
return s
Now you can apply this to your DataFrame:
In [11]: df = pd.DataFrame({'A': ["hello","world",'{"a":5,"b":6,"c":8}',"usa","india",'{"d":9,"e":10,"f":11}']})
In [12]: df.loc[2, "A"]
Out[12]: '{"a":5,"b":6,"c":8}'
In [13]: df
Out[13]:
A
0 hello
1 world
2 {"a":5,"b":6,"c":8}
3 usa
4 india
5 {"d":9,"e":10,"f":11}
In [14]: df.applymap(try_literal_eval)
Out[14]:
A
0 hello
1 world
2 {'a': 5, 'b': 6, 'c': 8}
3 usa
4 india
5 {'d': 9, 'e': 10, 'f': 11}
In [15]: df.applymap(try_literal_eval).loc[2, "A"]
Out[15]: {'a': 5, 'b': 6, 'c': 8}
Note: This is pretty expensive (time-wise) as far as other calls go, however when you're dealing with dictionaries in DataFrames/Series you're necessarily defaulting back to python objects so things are going to be relatively slow... It's probably a good idea to denormalize i.e. get the data back as columns e.g. using json_normalize.