Creating hive table using parquet file metadata

后端 未结 6 1456
面向向阳花
面向向阳花 2021-02-01 11:01

I wrote a DataFrame as parquet file. And, I would like to read the file using Hive using the metadata from parquet.

Output from writing parquet write

_co         


        
6条回答
  •  抹茶落季
    2021-02-01 11:17

    Here's a solution I've come up with to get the metadata from parquet files in order to create a Hive table.

    First start a spark-shell (Or compile it all into a Jar and run it with spark-submit, but the shell is SOO much easier)

    import org.apache.spark.sql.hive.HiveContext
    import org.apache.spark.sql.DataFrame
    
    
    val df=sqlContext.parquetFile("/path/to/_common_metadata")
    
    def creatingTableDDL(tableName:String, df:DataFrame): String={
      val cols = df.dtypes
      var ddl1 = "CREATE EXTERNAL TABLE "+tableName + " ("
      //looks at the datatypes and columns names and puts them into a string
      val colCreate = (for (c <-cols) yield(c._1+" "+c._2.replace("Type",""))).mkString(", ")
      ddl1 += colCreate + ") STORED AS PARQUET LOCATION '/wherever/you/store/the/data/'"
      ddl1
    }
    
    val test_tableDDL=creatingTableDDL("test_table",df,"test_db")
    

    It will provide you with the datatypes that Hive will use for each column as they are stored in Parquet. E.G: CREATE EXTERNAL TABLE test_table (COL1 Decimal(38,10), COL2 String, COL3 Timestamp) STORED AS PARQUET LOCATION '/path/to/parquet/files'

提交回复
热议问题