I am using Hive Metastore in EMR. I am able to query the table manually through HiveSQL .
But When i use the same table in Spark Job, it says Input path does no
I have had a similar error with HDFS where the Metastore kept a partition for the table, but the directory was missing
Check s3... If it is missing, or you deleted it, you need to run MSCK REPAIR TABLE
from Hive. Sometimes this doesn't work, and you actually do need a DROP PARTITION
That property is false by default, but you set configuration properties by passing a SparkConf
object to SparkContext
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName("test").set("spark.sql.hive.verifyPartitionPath", "false"))
sc = SparkContext(conf = conf)
Or, the Spark 2 way is using a SparkSession.
from pyspark.sql import SparkSession
spark = SparkSession.builder \
... .appName("test") \
... .config("spark.sql.hive.verifyPartitionPath", "false") \
... .enableHiveSupport()
... .getOrCreate()