I am new to spark/scala. I am trying to read some data from a hive table to a spark dataframe and then add a column based on some condition. Here is my code:
You can simply use datediff
inbuilt function to check for the days difference between two columns. you don't need to write your function or udf
function. And when function is also modified than yours
import org.apache.spark.sql.functions._
val finalDF = DF.withColumn("status",
when(col("past_due").equalTo(1) && col("item_due_date").isNotNull && !(lower(col("item_due_date")).equalTo("null")) && (datediff(col("partition_date"),col("item_due_date")) < 0) && col("item_decision").isNotNull && !(lower(col("item_decision")).equalTo("null")), "approved")
.otherwise(when(col("past_due").equalTo(1) && col("item_due_date").isNotNull && !(lower(col("item_due_date")).equalTo("null")) && (datediff(col("partition_date"),col("item_due_date")) < 0) && (col("item_decision").isNull || lower(col("item_decision")).equalTo("null")), "pending")
.otherwise(when(col("past_due").equalTo(1) && col("item_due_date").isNotNull && !(lower(col("item_due_date")).equalTo("null")) && (datediff(col("partition_date"),col("item_due_date")) >= 0), "expired")
.otherwise("null"))))
This logic will convert the dataframe
+--------+-------------+-------------+--------------+
|past_due|item_due_date|item_decision|partition_date|
+--------+-------------+-------------+--------------+
|1 |2017-12-14 |null |2017-11-22 |
|1 |2017-12-14 |Mitigate |2017-11-22 |
|1 |0001-01-14 |Mitigate |2017-11-22 |
|1 |0001-01-14 |Mitigate |2017-11-22 |
|0 |2018-03-18 |null |2017-11-22 |
|1 |2016-11-30 |null |2017-11-22 |
+--------+-------------+-------------+--------------+
with addition of status
column as
+--------+-------------+-------------+--------------+--------+
|past_due|item_due_date|item_decision|partition_date|status |
+--------+-------------+-------------+--------------+--------+
|1 |2017-12-14 |null |2017-11-22 |pending |
|1 |2017-12-14 |Mitigate |2017-11-22 |approved|
|1 |0001-01-14 |Mitigate |2017-11-22 |expired |
|1 |0001-01-14 |Mitigate |2017-11-22 |expired |
|0 |2018-03-18 |null |2017-11-22 |null |
|1 |2016-11-30 |null |2017-11-22 |expired |
+--------+-------------+-------------+--------------+--------+
I hope the answer is helpful