Pyspark : Change nested column datatype

后端未结

关注

 2  1406

广开言路 2021-01-23 18:52

How can we change the datatype of a nested column in Pyspark? For rxample, how can I change the data type of value from string to int?

Reference:how to change a Datafram

2条回答

迷失自我 (楼主)

2021-01-23 19:46

It may seem simple to use arbitrary variable names but this is problematic and contrary to PEP8. And when dealing with numbers, I suggest avoiding the common names used in iterating over such structures... ie, value.

import json

with open('random.json') as json_file:
    data = json.load(json_file)

for k, v in data.items():
    if k == 'y':
        for key, item in v.items():
            item['value'] = float(item['value'])


print(type(data['y']['p']['value']))
print(type(data['y']['q']['value']))
# mac → python3 make_float.py
# 
# 
json_data = json.dumps(data, indent=4, sort_keys=True)
with open('random.json', 'w') as json_file:
    json_file.write(json_data)

0 讨论(0)

查看其它2个回答