Spark: create a nested schema

隐身守侯 提交于 2021-01-28 06:50:41

问题


With spark,

import spark.implicits._
val data = Seq(
  (1, ("value11", "value12")),
  (2, ("value21", "value22")),
  (3, ("value31", "value32"))
  )

 val df = data.toDF("id", "v1")
 df.printSchema()

The result is the following:

root
|-- id: integer (nullable = false)
|-- v1: struct (nullable = true)
|    |-- _1: string (nullable = true)
|    |-- _2: string (nullable = true)

Now if I want to create the schema myself, how should I process?

val schema = StructType(Array(
  StructField("id", IntegerType),
  StructField("nested", ???)
))

Thanks.


回答1:


According to example in here: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/types/StructType.html

 import org.apache.spark.sql._
 import org.apache.spark.sql.types._

 val innerStruct =
   StructType(
     StructField("f1", IntegerType, true) ::
     StructField("f2", LongType, false) ::
     StructField("f3", BooleanType, false) :: Nil)

 val struct = StructType(
   StructField("a", innerStruct, true) :: Nil)

 // Create a Row with the schema defined by struct
 val row = Row(Row(1, 2, true))

And in your case it will be:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val schema = StructType(Array(
  StructField("id", IntegerType),
  StructField("nested", StructType(Array(
      StructField("value1", StringType),
      StructField("value2", StringType)
  )))
))

Output:

StructType(
  StructField(id,IntegerType,true), 
  StructField(nested,StructType(
    StructField(value1,StringType,true), 
    StructField(value2,StringType,true)
  ),true)
)


来源:https://stackoverflow.com/questions/57079343/spark-create-a-nested-schema

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!