Get dataframe schema load to metadata table

后端 未结 2 1099
醉话见心
醉话见心 2021-01-23 09:41

Use case is to read a file and create a dataframe on top of it.After that get the schema of that file and store into a DB table.

For example purpose I am just creating a

相关标签:
2条回答
  • 2021-01-23 10:02

    Try this -

    //-- For local file
    val rdd = spark.read.option("wholeFile", true).option("delimiter",",").csv(s"file:///file/path/file.csv").rdd
    
    val schema = StructType(Seq(StructField("Name", StringType, true),
                                StructField("Age", IntegerType, true),
                                StructField("Designation", StringType, true),
                                StructField("Salary", IntegerType, true),
                                StructField("ZipCode", IntegerType, true)))
    
    val df = spark.createDataFrame(rdd,schema)
    
    0 讨论(0)
  • 2021-01-23 10:14

    Spark >= 2.4.0

    In order to save the schema into a string format you can use the toDDL method of the StructType. In your case the DDL format should be:

    `Name` STRING, `Age` INT, `Designation` STRING, `Salary` INT, `ZipCode` INT
    

    After saving the schema you can load it from the database and use it as StructType.fromDDL(my_schema) this will return an instance of StructType which you can use to create the new dataframe with spark.createDataFrame as @Ajay already mentioned.

    Also is useful to remember that you can always extract the schema given a case class with:

    import org.apache.spark.sql.catalyst.ScalaReflection
    val empSchema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]
    

    And then you can get the DDL representation with empSchema.toDDL.

    Spark < 2.4

    For Spark < 2.4 use DataType.fromDDL and schema.simpleString accordingly. Also instead of returning a StructType you should use an DataType instance omitting the cast to StructType as next:

    val empSchema = ScalaReflection.schemaFor[Employee].dataType
    

    Sample output for empSchema.simpleString:

    struct<Name:string,Age:int,Designation:string,Salary:int,ZipCode:int>
    
    0 讨论(0)
提交回复
热议问题