Defining DataFrame Schema for a table with 1500 columns in Spark

后端 未结 3 1463
轻奢々
轻奢々 2021-01-23 02:29

I have a table with around 1500 columns in SQL Server. I need to read the data from this table and then convert it to proper datatype format and then insert the records into Ora

3条回答
  •  抹茶落季
    2021-01-23 02:51

    You can have your schema with hundreds of columns in the json format. And then you can read this json file to construct you custom schema.

    For example, Your schema json be:

    [
        {
            "columnType": "VARCHAR",
            "columnName": "NAME",
            "nullable": true
        },
        {
            "columnType": "VARCHAR",
            "columnName": "AGE",
            "nullable": true
        },
        .
        .
        .
    ]
    

    Now you can read the the json to parse it to some case class to form the StructType.

    case class Field(name: String, dataType: String, nullable: Boolean)
    

    You can create a Map to have spark DataTypes corresponding to column Type Oracle string in json schema.

    val dataType = Map(
       "VARCHAR" -> StringType,
       "NUMERIC" -> LongType,
       "TIMESTAMP" -> TimestampType,
       .
       .
       .
    )
    
    def parseJsonForSchema(jsonFilePath: String) = {
       val jsonString = Source.fromFile(jsonFilePath).mkString
       val parsedJson = parse(jsonString)
       val fields = parsedJson.extract[Field]
       val schemaColumns = fields.map(field => StructField(field.name, getDataType(field), field.nullable))
       StructType(schemaColumns)
    }
    

提交回复
热议问题