发表新帖

发表新帖

Get dataframe schema load to metadata table

后端未结

关注

 2  1105

醉话见心 2021-01-23 09:41

Use case is to read a file and create a dataframe on top of it.After that get the schema of that file and store into a DB table.

For example purpose I am just creating a

2条回答

梦毁少年i (楼主)

2021-01-23 10:14
Spark >= 2.4.0

In order to save the schema into a string format you can use the toDDL method of the StructType. In your case the DDL format should be:
```
`Name` STRING, `Age` INT, `Designation` STRING, `Salary` INT, `ZipCode` INT
```
After saving the schema you can load it from the database and use it as StructType.fromDDL(my_schema) this will return an instance of StructType which you can use to create the new dataframe with spark.createDataFrame as @Ajay already mentioned.

Also is useful to remember that you can always extract the schema given a case class with:
```
import org.apache.spark.sql.catalyst.ScalaReflection
val empSchema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]
```
And then you can get the DDL representation with empSchema.toDDL.

Spark < 2.4

For Spark < 2.4 use DataType.fromDDL and schema.simpleString accordingly. Also instead of returning a StructType you should use an DataType instance omitting the cast to StructType as next:
```
val empSchema = ScalaReflection.schemaFor[Employee].dataType
```
Sample output for empSchema.simpleString:
```
struct
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题