SPARK SQL fails if there is no specified partition path available

后端 未结 1 405
借酒劲吻你
借酒劲吻你 2021-01-28 16:52

I am using Hive Metastore in EMR. I am able to query the table manually through HiveSQL .
But When i use the same table in Spark Job, it says Input path does no

相关标签:
1条回答
  • 2021-01-28 17:21

    I have had a similar error with HDFS where the Metastore kept a partition for the table, but the directory was missing

    Check s3... If it is missing, or you deleted it, you need to run MSCK REPAIR TABLE from Hive. Sometimes this doesn't work, and you actually do need a DROP PARTITION

    That property is false by default, but you set configuration properties by passing a SparkConf object to SparkContext

    from pyspark import SparkConf, SparkContext
    
    conf = SparkConf().setAppName("test").set("spark.sql.hive.verifyPartitionPath", "false"))
    sc = SparkContext(conf = conf)
    

    Or, the Spark 2 way is using a SparkSession.

    from pyspark.sql import SparkSession
    
    spark = SparkSession.builder \
    ...     .appName("test") \
    ...     .config("spark.sql.hive.verifyPartitionPath", "false") \
    ...     .enableHiveSupport()
    ...     .getOrCreate()
    
    0 讨论(0)
提交回复
热议问题