I\'ve a stream of JSONs with following structure that gets converted to dataframe
{
\"a\": 3936,
\"b\": 123,
\"c\": \"34\",
\"attributes\": {
\"
Use Python
Save the results to a new file.
import pandas as pd
data = pd.read_csv("data.csv") # load the csv file from your disk
json_data = data['Desc'] # get the DataFrame of Desc
data = data.drop('Desc', 1) # delete Desc column
Total, Defective = [], [] # setout list
for i in json_data:
i = eval(i) # change the data type from 'str' to 'dict'
Total.append(i['Total']) # append 'Total' feature
Defective.append(i['Defective']) # append 'Defective' feature
# finally,complete the DataFrame
data['Total'] = Total
data['Defective'] = Defective
data.to_csv("result.csv") # save to the result.csv and check it
Using the attributes.d notation, you can create new columns and you will have them in your DataFrame. Look at the withColumn() method in Java.
If you want columns named from a
to f
:
df.select("a", "b", "c", "attributes.d", "attributes.e", "attributes.f")
If you want columns named with attributes.
prefix:
df.select($"a", $"b", $"c", $"attributes.d" as "attributes.d", $"attributes.e" as "attributes.e", $"attributes.f" as "attributes.f")
If names of your columns are supplied from an external source (e.g. configuration):
val colNames: Seq("a", "b", "c", "attributes.d", "attributes.e", "attributes.f")
df.select(colNames.head, colNames.tail: _*).toDF(colNames:_*)