问题
I'm working on two pyspark dataframes and doing a left-anti join on them to track everyday changes and then send an email.
The first time I tried:
diff = Table_a.join(
Table_b,
[Table_a.col1== Table_b.col1, Table_a.col2== Table_b.col2],
how='left_anti'
)
Expected output is a pyspark dataframe with some or no data.
This diff dataframe gets it's schema from Table_a. The first time I ran it, showed no data as expected with the schema representation. The next time onwards just throws SparkException:
Exception thrown in Future.get
回答1:
I use Scala, but, from my experience, this happens when one of the underlying tables has been changed somehow. My advice would be to try to run simply
display(Table_a)
and display(Table_b)
, and see if any of those commands fail. This should give you a hint about where is the problem.
In any case, to effectively solve the issue, my advice would clearing the cache running
%sql
REFRESH my_schema.table_a
REFRESH my_schema.table_b
and, then, redefining those variables, as in
Table_a = spark.table("my_schema.table_a")
Table_b = spark.table("my_schema.table_b")
This worked for me - hope it helps you too.
回答2:
Thank you @Lucas Lima. Every time i create a new table i clear the cache with the following command in pyspark:
table_a.cache()
Hope the information helps.
回答3:
I was similar type of issue. The root cause of the issue was the data type mismatch.
At that time of saving data one of my column had the data type as IntegerType and when I was loading the same data then I provided incorrect data type in schema hence it was throwing the exception.
This will not throw any exception immediately until you call any action of the data such as show() or count on the loaded data.
Error Screenshot:
来源:https://stackoverflow.com/questions/56710198/how-can-i-resolve-sparkexception-exception-thrown-in-future-get-issue