Spark dataframes convert nested JSON to seperate columns

前端 未结 3 732
刺人心
刺人心 2021-01-27 05:58

I\'ve a stream of JSONs with following structure that gets converted to dataframe

{
  \"a\": 3936,
  \"b\": 123,
  \"c\": \"34\",
  \"attributes\": {
    \"         


        
相关标签:
3条回答
  • 2021-01-27 06:25

    Use Python

    1. Extract the DataFrame by using the pandas Lib of python.
    2. Change the data type from 'str' to 'dict'.
    3. Get the values of each features.
    4. Save the results to a new file.

      import pandas as pd
      
      data = pd.read_csv("data.csv")  # load the csv file from your disk
      json_data = data['Desc']        # get the DataFrame of Desc
      data = data.drop('Desc', 1)     # delete Desc column
      Total, Defective = [], []       # setout list
      
      for i in json_data:
          i = eval(i)     # change the data type from 'str' to 'dict'
          Total.append(i['Total'])    # append 'Total' feature
          Defective.append(i['Defective'])    # append 'Defective' feature
      
      # finally,complete the DataFrame
      data['Total'] = Total
      data['Defective'] = Defective
      
      data.to_csv("result.csv")       # save to the result.csv and check it
      
    0 讨论(0)
  • 2021-01-27 06:27

    Using the attributes.d notation, you can create new columns and you will have them in your DataFrame. Look at the withColumn() method in Java.

    0 讨论(0)
  • 2021-01-27 06:30
    • If you want columns named from a to f:

      df.select("a", "b", "c", "attributes.d", "attributes.e", "attributes.f")
      
    • If you want columns named with attributes. prefix:

      df.select($"a", $"b", $"c", $"attributes.d" as "attributes.d", $"attributes.e" as "attributes.e", $"attributes.f" as "attributes.f")
      
    • If names of your columns are supplied from an external source (e.g. configuration):

      val colNames: Seq("a", "b", "c", "attributes.d", "attributes.e", "attributes.f")
      
      df.select(colNames.head, colNames.tail: _*).toDF(colNames:_*)
      
    0 讨论(0)
提交回复
热议问题