Creating hive table using parquet file metadata

后端 未结 6 1454
面向向阳花
面向向阳花 2021-02-01 11:01

I wrote a DataFrame as parquet file. And, I would like to read the file using Hive using the metadata from parquet.

Output from writing parquet write

_co         


        
6条回答
  •  后悔当初
    2021-02-01 11:25

    A small improvement over Victor (adding quotes on field.name) and modified to bind the table to a local parquet file (tested on spark 1.6.1)

    def dataFrameToDDL(dataFrame: DataFrame, tableName: String, absFilePath: String): String = {
        val columns = dataFrame.schema.map { field =>
          "  `" + field.name + "` " + field.dataType.simpleString.toUpperCase
        }
        s"CREATE EXTERNAL TABLE $tableName (\n${columns.mkString(",\n")}\n) STORED AS PARQUET LOCATION '"+absFilePath+"'"
      }
    

    Also notice that:

    • A HiveContext is needed since SQLContext does not support creating external table.
    • The path to the parquet folder must be an absolute path

提交回复
热议问题