Pyspark : Change nested column datatype

后端 未结 2 1406
广开言路
广开言路 2021-01-23 18:52

How can we change the datatype of a nested column in Pyspark? For rxample, how can I change the data type of value from string to int?

Reference:how to change a Datafram

2条回答
  •  迷失自我
    2021-01-23 19:46

    It may seem simple to use arbitrary variable names but this is problematic and contrary to PEP8. And when dealing with numbers, I suggest avoiding the common names used in iterating over such structures... ie, value.

    import json
    
    with open('random.json') as json_file:
        data = json.load(json_file)
    
    for k, v in data.items():
        if k == 'y':
            for key, item in v.items():
                item['value'] = float(item['value'])
    
    
    print(type(data['y']['p']['value']))
    print(type(data['y']['q']['value']))
    # mac → python3 make_float.py
    # 
    # 
    json_data = json.dumps(data, indent=4, sort_keys=True)
    with open('random.json', 'w') as json_file:
        json_file.write(json_data)
    

提交回复
热议问题