Spark dataframes convert nested JSON to seperate columns

前端 未结 3 735
刺人心
刺人心 2021-01-27 05:58

I\'ve a stream of JSONs with following structure that gets converted to dataframe

{
  \"a\": 3936,
  \"b\": 123,
  \"c\": \"34\",
  \"attributes\": {
    \"         


        
3条回答
  •  遥遥无期
    2021-01-27 06:25

    Use Python

    1. Extract the DataFrame by using the pandas Lib of python.
    2. Change the data type from 'str' to 'dict'.
    3. Get the values of each features.
    4. Save the results to a new file.

      import pandas as pd
      
      data = pd.read_csv("data.csv")  # load the csv file from your disk
      json_data = data['Desc']        # get the DataFrame of Desc
      data = data.drop('Desc', 1)     # delete Desc column
      Total, Defective = [], []       # setout list
      
      for i in json_data:
          i = eval(i)     # change the data type from 'str' to 'dict'
          Total.append(i['Total'])    # append 'Total' feature
          Defective.append(i['Defective'])    # append 'Defective' feature
      
      # finally,complete the DataFrame
      data['Total'] = Total
      data['Defective'] = Defective
      
      data.to_csv("result.csv")       # save to the result.csv and check it
      

提交回复
热议问题