How can we change the datatype of a nested column in Pyspark? For rxample, how can I change the data type of value from string to int?
Reference:how to change a Datafram
It may seem simple to use arbitrary variable names but this is problematic and contrary to PEP8. And when dealing with numbers, I suggest avoiding the common names used in iterating over such structures... ie, value.
import json
with open('random.json') as json_file:
data = json.load(json_file)
for k, v in data.items():
if k == 'y':
for key, item in v.items():
item['value'] = float(item['value'])
print(type(data['y']['p']['value']))
print(type(data['y']['q']['value']))
# mac → python3 make_float.py
#
#
json_data = json.dumps(data, indent=4, sort_keys=True)
with open('random.json', 'w') as json_file:
json_file.write(json_data)