I need to compare two dataframes for type validation and send a nonzero value as output

后端 未结 1 1888
灰色年华
灰色年华 2021-01-29 08:40

I am comparing two dataframes (basically these are schema of two different data sources one from hive and other from SAS9.2)

I need to validate structure for both data

相关标签:
1条回答
  • 2021-01-29 09:02

    you can join the two dataframes and then compare the two columns corressponding to the columns type via a Map and UDF. This is a code sample that does that. You need to complete the map with the right values

     val sqlCtx = sqlContext
    import sqlCtx.implicits._
    
    
    val metadata: DataFrame= Seq(
      (Some("1"), "DATETIME", "Num", "8", "DATETIME20", "DATETIME20"),
      (Some("3"), "SOURCEBANK", "Num", "1", "null", "null")
    ).toDF("SNo", "Variable", "Type", "Len", "Format", "Informat")
    
    val metadataAdapted: DataFrame = metadata
      .withColumn("Name", functions.upper(col("Variable")))
      .withColumnRenamed("Type", "TypeHive")
    val sasDF = Seq(("datetime", "TimestampType"),
      ("datetime", "TimestampType")
    ).toDF("variable", "type")
    val sasDFAdapted = sasDF
      .withColumn("Name", functions.upper(col("variable")))
      .withColumnRenamed("Type", "TypeSaS")
    
    val res = sasDFAdapted.join(metadataAdapted, Seq("Name"), "inner")
    
    val map = Map("TimestampType" -> "Num")
     def udfType(dict: Map[String, String]) = functions.udf( (typeVar: String) => dict(typeVar))
    val result = res.withColumn("correctMapping", udfType(map)(col("TypeSaS")) === col("TypeHive"))
    
    0 讨论(0)
提交回复
热议问题